Command:        srun -n 128 ./ccsm.exe
Resources:      1 node (128 physical, 256 logical cores per node)
Memory:         251 GiB per node
Tasks:          128 processes, OMP_NUM_THREADS was 1
Machine:        b3238.betzy.sigma2.no
Start time:     Tue Apr 20 21:03:59 2021
Total time:     658 seconds (about 11 minutes)
Full path:      /cluster/work/users/ingo/noresm/norcpm1_piControl_pecountS2_00010101/norcpm1_piControl_pecountS2_00010101_mem01/run

Summary: ccsm.exe is MPI-bound in this configuration
Compute:                                     45.6% |====|
MPI:                                         54.3% |====|
I/O:                                          0.1% ||
This application run was MPI-bound. A breakdown of this time and advice for investigating further is in the MPI section below. 

CPU:
A breakdown of the 45.6% CPU time:
Single-core code:                            99.9% |=========|
OpenMP regions:                               0.1% ||
Scalar numeric ops:                          24.0% |=|
Vector numeric ops:                           8.3% ||
Memory accesses:                             67.7% |======|
The per-core performance is memory-bound. Use a profiler to identify time-consuming loops and check their cache performance.
Little time is spent in vectorized instructions. Check the compiler's vectorization advice to see why key loops could not be vectorized.

MPI:
A breakdown of the 54.3% MPI time:
Time in collective calls:                    73.3% |======|
Time in point-to-point calls:                26.7% |==|
Effective process collective rate:            1.57 MB/s
Effective process point-to-point rate:         110 MB/s
Most of the time is spent in collective calls with a very low transfer rate. This suggests load imbalance is causing synchronization overhead; use an MPI profiler to investigate.

I/O:
A breakdown of the 0.1% I/O time:
Time in reads:                               80.2% |=======|
Time in writes:                              19.8% |=|
Effective process read rate:                   263 MB/s
Effective process write rate:                  723 MB/s
Most of the time is spent in read operations with an average effective transfer rate. It may be possible to achieve faster effective transfer rates using asynchronous file operations.

OpenMP:
A breakdown of the 0.1% time in OpenMP regions:
Computation:                                  0.0% |
Synchronization:                              0.0% |
Physical core utilization:                  100.0% |=========|
System load:                                 99.6% |=========|
No measurable time is spent in OpenMP regions.

Memory:
Per-process memory usage may also affect scaling:
Mean process memory usage:                     744 MiB
Peak process memory usage:                    1001 MiB
Peak node memory usage:                      45.0% |====|
The peak node memory usage is low. Running with fewer MPI processes and more data on each process may be more efficient.

Energy:
A breakdown of how energy was used:
CPU:                                      not supported
System:                                   not supported
Mean node power:                          not supported
Peak node power:                              0.00 W
Energy metrics are not available on this system.
CPU metrics are not supported (no intel_rapl module)