Command: srun -n 128 ./ccsm.exe Resources: 1 node (128 physical, 256 logical cores per node) Memory: 251 GiB per node Tasks: 128 processes, OMP_NUM_THREADS was 1 Machine: b3238.betzy.sigma2.no Start time: Tue Apr 20 21:03:59 2021 Total time: 658 seconds (about 11 minutes) Full path: /cluster/work/users/ingo/noresm/norcpm1_piControl_pecountS2_00010101/norcpm1_piControl_pecountS2_00010101_mem01/run Summary: ccsm.exe is MPI-bound in this configuration Compute: 45.6% |====| MPI: 54.3% |====| I/O: 0.1% || This application run was MPI-bound. A breakdown of this time and advice for investigating further is in the MPI section below. CPU: A breakdown of the 45.6% CPU time: Single-core code: 99.9% |=========| OpenMP regions: 0.1% || Scalar numeric ops: 24.0% |=| Vector numeric ops: 8.3% || Memory accesses: 67.7% |======| The per-core performance is memory-bound. Use a profiler to identify time-consuming loops and check their cache performance. Little time is spent in vectorized instructions. Check the compiler's vectorization advice to see why key loops could not be vectorized. MPI: A breakdown of the 54.3% MPI time: Time in collective calls: 73.3% |======| Time in point-to-point calls: 26.7% |==| Effective process collective rate: 1.57 MB/s Effective process point-to-point rate: 110 MB/s Most of the time is spent in collective calls with a very low transfer rate. This suggests load imbalance is causing synchronization overhead; use an MPI profiler to investigate. I/O: A breakdown of the 0.1% I/O time: Time in reads: 80.2% |=======| Time in writes: 19.8% |=| Effective process read rate: 263 MB/s Effective process write rate: 723 MB/s Most of the time is spent in read operations with an average effective transfer rate. It may be possible to achieve faster effective transfer rates using asynchronous file operations. OpenMP: A breakdown of the 0.1% time in OpenMP regions: Computation: 0.0% | Synchronization: 0.0% | Physical core utilization: 100.0% |=========| System load: 99.6% |=========| No measurable time is spent in OpenMP regions. Memory: Per-process memory usage may also affect scaling: Mean process memory usage: 744 MiB Peak process memory usage: 1001 MiB Peak node memory usage: 45.0% |====| The peak node memory usage is low. Running with fewer MPI processes and more data on each process may be more efficient. Energy: A breakdown of how energy was used: CPU: not supported System: not supported Mean node power: not supported Peak node power: 0.00 W Energy metrics are not available on this system. CPU metrics are not supported (no intel_rapl module)