URZ HPC-Cluster Sofja |
![]() outside-Sep21, night-view, warm-gangway |
Sofja - 288 Nodes Infiniband-Cluster (ca. 800 TFLOPs)
News:
|
HPC means "High Performance Computing" (dt.: "Hochleistungsrechnen"). This Cluster "Sofja" is a HPC cluster for universitary sientific use. It is mainly for parallelized applications with high network communication and high memory demand, i.e. things which do not fit on a single workstation. It is based on linux, the job-scheduler slurm and the MPI library for the high-speed network. The HPC-cluster Sofja replaces the older HPC-system Neumann.
Architecture: | 292 infiniband-connected ccNuma-nodes | |
Prozessor (CPU): | 2 x 16c/32t Ice Lake Xeon 6326 base=2.9GHz max.turbo=3.5GHz(no_avx?) 512-bit-Vector-support (AVX512 2FMA 2.4GHz?) 32 FLOP/clock, 3TFLOP/node (flt64), 8 memory-channels/CPU je 3.2GT/s 205 GB/s/CPU, 185 W/CPU | |
Board: | D50TNP1SB (4 boards per 2HE-chassis) | |
Main Memory (RAM): | 256 Gbytes, 16*16GB-DDR4-3200GT/s-ECC Memory-Bandwidth 410 GB/s/Node (4 fat nodes with 1024GB/Node) | |
Storage (disks): | diskless compute nodes,
5 x BeeGFS nodes with dm-encrypted 3*(8+2 RAID6) * 4TB each,
ca. 430TB in summary, ca. 2.4GB/s per storage node,
2022 extended to 10 nodes 870TB, ior-results: home=2.5GB/s (1oss), scratch=10.6GB/s (9oss,1node), scratch=20.4GB/s (9oss,2nodes) | |
network: | Gigabit-Ethernet (management), HDR/2-Infiniband (100Gb/s) non-blocking | |
Power consumption: | ca. 180kW max. (idle: ca. 54kW) | |
Performance data: | MemStream: 409 GB/s/node, Triad 71% | |
MPI: 12.3 GB/s/wire (alltoall uniform, best case) | ||
Peak = ... TFLOPs (... FLOP/Word, ... GF/W) |
module avail # show available Software-Moduls (Compiler)
module list # show loaded modules (also: echo $LOADMODULES)
module load mpi/openmpi-4.1 # OpenMPI
module load ... # libblas + liblapack
# you can put this for your favourite modules to .bash_profile
#!/bin/bash # please check http://www-e.uni-magdeburg.de/urzs/hpc21/ periodically 2021-11 # # lines beginning with "#SBATCH" are instructions for the jobsystem (man slurm). # lines beginning with "##SBATCH" are comments # #SBATCH -J job-01 # jobname displayed by squeue #SBATCH -N 1 # use 1 node ##SBATCH -N 4 # use 4 nodes # do not waste nodes (check scaling of your app), other users may need them #SBATCH --ntasks-per-node 1 # 1 for multi-thread-codes (using 32 cores) #SBATCH --cpus-per-task 32 # 1 tasks/node * 32 threads/task = 32 threads/node ##SBATCH --ntasks-per-node 2 # 2 for hybrid code, 2 tasks * 8 cores/task ##SBATCH --cpus-per-task 16 # 2 tasks/node * 16 threads/task = 32 threads/node ##... ##SBATCH --ntasks-per-node 32 # 32 for pure MPI-code or 32 single-core-apps ##SBATCH --cpus-per-task 1 # 32 tasks/node * 1 threads/task = 32 threads/node ##SBATCH --cpu-bind=threads # binding: 111.. 222... 333... per node #SBATCH --time 01:00:00 # set 1h walltime (=maximum runtime), see sinfo #SBATCH --mem 80000 # [MB/node], please use less than 120000 MB # please use all cores of a node (especially small jobs fitting to one node) # # most output is for more simple debugging (better support): . /beegfs1/urz/utils/slurmProlog.sh # output settings, check node healthy # # load modulefiles which set paths to mpirun and libs (see website) echo "DEBUG: LOADEDMODULES=$LOADEDMODULES" # module list #module load gcc/... # if you need gcc or gcc-libs on nodes, NA #module load openblas/... # multithread basic linear algebra, NA module load mpi/openmpi-4.1 # message passing interface #module load ansys # Ansys-Simulations, License needed!, NA echo "DEBUG: LOADEDMODULES=$LOADEDMODULES" # module list # # --- please comment out and modify the part you will need! --- # --- for MPI-Jobs and hybrid MPI/OpenMP-Jobs only --- ## set debug-output for small test jobs only: # [ "$SLURM_NNODES" ] && [ $SLURM_NNODES -lt 4 ] && mpidebug="--report-bindings" # # prepare nodefile for software using its own MPI (ansys/fluent, starccm++) # self compiled openmpi-programs do not need the nodelist or hostfile HOSTFILE=slurm-$SLURM_JOBID.hosts scontrol show hostnames $SLURM_JOB_NODELIST > $HOSTFILE # one entry per host # # ## please use $SCRATCH/$USER for big data, but /home/$USER for # ## slurm output to get error messages in case of scratch over quota, # ## check your quota before starting new jobs # cd $SCRATCH/$USER # mpirun -npernode $SLURM_NTASKS_PER_NODE $mpidebug\ # --map-by slot:PE=$SLURM_CPUS_PER_TASK --bind-to core\ # ./mpi_application # # ## for ((i=0;i<$SLURM_NPROCS;i++));do ./app1 $i;done # serial version # srun bash -c "./app1 \$SLURM_PROCID" # parallel version # # -------------------------- post-processing ----------- . /beegfs1/urz/utils/slurmEpilog.sh # output final state, cleanup #
sinfo # list available queues/partitions
sbatch job.sh # start job (stop using scancel _JobId_) sbatch -p big job.sh # start big-job (max 140 nodes) sbatch -p short job.sh # short test-jobs max. runtime 1h, sbatch -p longrun job.sh # if you have no other choice use this, minimize nodes # only one job allowed, nodes will be blocked for other users a long time # ToDo: longrun jobs only for authorized projects, 1h limit else # or think about checkpointing your application # PLEASE do not flood partitions with your jobs (limit yourself to 10 jobs) # better collect lot of small jobs to a bigger one, please notice that # HPC-Clusters are mainly for big jobs which do not fit to single nodes # PLEASE do not use login node for computations, other users need it squeue -u $USER # show own job list scancel -u $USER # cancel all user jobs (running and pending) squeue_all # gives a better overview (more but compact info) squeue_all -l # incl. Pending-Reason and Nodes-Allocated (since 2018-03)
Access via ssh sofja.urz.uni-magdeburg.de (141.44.5.38) is only allowed from within the universitary IP-range. As login please use your universitary account name. It is recommended to use ssh-public-keys for passwordless logins. Please send your ssh public key together with a short description of your project, the project time and the GB of storage you probably need at maximum during your project. Students need a formless confirmation of their universitary tutor, that they are allowed to use central HPC resources for science. This machine is not suited for work with personal data. If you use Windows and Excced für for the access (graphical), take a look to the windows/ssh configuration hints. Please note that the HPC storage is not intended for long time data archievement. There is only some hardware redundance (RAID6) and no backup. We explicite do not backups to not reduce performance of the applications. So far you are responsible to safe your data outside the HPC system. Please remove unneeded data to left more space for others. Thanks! For questions and problems please contact the administration via mailto:Joerg.Schulenburg+hpc21(at)URZ.Uni-Magdeburg.DE?subject=hpc-sofja or Tel.58408 (german or english).
This is a incomplete list of projects on this cluster to give you an impression, what the cluster is used for.
...
Author: Joerg Schulenburg, Uni-Magdeburg URZ, Tel. 58408 (2021-2026)