Command:        srun -n 64 ./ccsm.exe
Resources:      1 node (128 physical, 256 logical cores per node)
Memory:         251 GiB per node
Tasks:          64 processes, OMP_NUM_THREADS was 1
Machine:        b5195.betzy.sigma2.no
Start time:     Tue Apr 20 19:39:28 2021
Total time:     419 seconds (about 7 minutes)
Full path:      /cluster/work/users/ingo/noresm/norcpm1_piControl_pecountS_00010101/norcpm1_piControl_pecountS_00010101_mem01/run

Summary: ccsm.exe is Compute-bound in this configuration
Compute:                                     64.3% |=====|
MPI:                                         35.5% |===|
I/O:                                          0.3% ||
This application run was Compute-bound. A breakdown of this time and advice for investigating further is in the CPU section below. 

CPU:
A breakdown of the 64.3% CPU time:
Single-core code:                            99.9% |=========|
OpenMP regions:                               0.1% ||
Scalar numeric ops:                          30.6% |==|
Vector numeric ops:                           7.1% ||
Memory accesses:                             62.3% |=====|
The per-core performance is memory-bound. Use a profiler to identify time-consuming loops and check their cache performance.
Little time is spent in vectorized instructions. Check the compiler's vectorization advice to see why key loops could not be vectorized.

MPI:
A breakdown of the 35.5% MPI time:
Time in collective calls:                    36.7% |===|
Time in point-to-point calls:                63.3% |=====|
Effective process collective rate:            10.1 MB/s
Effective process point-to-point rate:         158 MB/s
Most of the time is spent in point-to-point calls with an average transfer rate. Using larger messages and overlapping communication and computation may increase the effective transfer rate.
The collective transfer rate is low. This can be caused by inefficient message sizes, such as many small messages, or by imbalanced workloads causing processes to wait.

I/O:
A breakdown of the 0.3% I/O time:
Time in reads:                               86.3% |========|
Time in writes:                              13.7% ||
Effective process read rate:                   198 MB/s
Effective process write rate:                  840 MB/s
Most of the time is spent in read operations with an average effective transfer rate. It may be possible to achieve faster effective transfer rates using asynchronous file operations.

OpenMP:
A breakdown of the 0.1% time in OpenMP regions:
Computation:                                  0.0% |
Synchronization:                              0.0% |
Physical core utilization:                   50.0% |====|
System load:                                 50.1% |====|
Physical core utilization is low and some cores may be unused. Try increasing OMP_NUM_THREADS to improve performance.

Memory:
Per-process memory usage may also affect scaling:
Mean process memory usage:                     734 MiB
Peak process memory usage:                    1015 MiB
Peak node memory usage:                      24.0% |=|
The peak node memory usage is very low. Running with fewer MPI processes and more data on each process may be more efficient.

Energy:
A breakdown of how energy was used:
CPU:                                      not supported
System:                                   not supported
Mean node power:                          not supported
Peak node power:                              0.00 W
Energy metrics are not available on this system.
CPU metrics are not supported (no intel_rapl module)