This cluster is for small and medium jobs (32 x86_64-cores, 256GB memory). It has excellent memory speed (8xDDR4) but 10GbE between nodes only. Since 2020 it is also for preparation of GPU-jobs. Access is possible after login using ssh (secure shell) to hpc18.urz.uni-magdeburg.de (intra-uni-network only). Operating system is Linux CentOS. This cluster is not suited for personal data.
#!/bin/bash # this is first draft (bad cpu pinning is a problem) # #SBATCH -J jobname1 #SBATCH -N 2 # Zahl der nodes, 110 GB per, node range: 1..12 # add option cpus-per-task and cpu-bind on 2024-11-11 #SBATCH --ntasks-per-node 1 # 1 for multi-thread-codes (using 32 cores) #SBATCH --cpus-per-task 32 # 1 tasks/node * 32 threads/task = 32 threads/node ##SBATCH --ntasks-per-node 2 # 2 for hybrid code, 2 tasks * 8 cores/task ##SBATCH --cpus-per-task 16 # 2 tasks/node * 16 threads/task = 32 threads/node ##... ##SBATCH --ntasks-per-node 32 # 32 for pure MPI-code or 32 single-core-apps ##SBATCH --cpus-per-task 1 # 32 tasks/node * 1 threads/task = 32 threads/node ##SBATCH --cpu-bind=threads # binding: 111.. 222... 333... per node #SBATCH --time 0:59:00 # set 59min walltime #SBATCH --mem 110000 # please use max. 110GB for better priorisation # exec 2>&1 # send errors into stdout stream echo "DEBUG: SLURM_JOB_NODELIST=$SLURM_JOB_NODELIST" echo "DEBUG: SLURM_NNODES=$SLURM_NNODES" echo "DEBUG: SLURM_TASKS_PER_NODE=$SLURM_TASKS_PER_NODE" #env | grep -e MPI -e SLURM echo "DEBUG: host=$(hostname) pwd=$(pwd) ulimit=$(ulimit -v) \$1=$1 \$2=$2" scontrol show Job $SLURM_JOBID # show slurm-command and more for DBG /usr/local/bin/quota_all # show quotas (Feb22), but node-nfs sees no quota! echo "ulimit -v = $(ulimit -v)" # this may be low-mem relevant debugging module load mpi/openmpi module list HOSTFILE=slurm-$SLURM_JOBID.hosts scontrol show hostnames $SLURM_JOB_NODELIST > $HOSTFILE # one entry per host awk '{print $1,"slots=32"}' $HOSTFILE > $HOSTFILE.2 echo "DEBUG: taskset= $(taskset -p $$)" NPERNODE=$SLURM_NTASKS_PER_NODE if [ -z "$NPERNODE" ];then NPERNODE=32; fi # default 32 mpi-tasks/node echo "DEBUG: NPERNODE= $NPERNODE" export OMP_NUM_THREADS=$[32/NPERNODE] export OMP_WAIT_POLICY="PASSIVE" # reduces OMP energy consumption export OMPI_MCA_mpi_yield_when_idle=1 # untested, low energy OMPI ??? export OMPI_MCA_hwloc_base_binding_policy=none # pin-problem work arround??? # default Core-binding is bad, two tasks bound on 2 hyperthreads of same core # but helps for srun only, not for direct mpirun (no mpi standard!?) # obsolete, if hyperthreading is disabled by BIOS TASKSET="taskset 0xffffffff" # 2019-01 ok for 1*32t, 32*1t # try to fix 2 most simple cases here: if [ $NPERNODE == 32 ];then export SLURM_CPU_BIND=v,map_cpu:1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 fi if [ $NPERNODE == 1 ];then TASKSET="taskset 0xffffffff" # 32bit bask fi #mpirun -np 1 --report-bindings --oversubscribe -v --npernode $NPERNODE bash -c "taskset -p $$;ps aux" # hybrid binary mpi+multithread mpirun --report-bindings --oversubscribe -v --npernode $NPERNODE $TASKSET mpi-binary -t$OMP_NUM_THREADSrun with: sbatch jobfile
2018-02 installation 12 nodes 2018-04 bad CPU pinning of slurm(?),openmpi(?), workarround script 2018-05 set MTU=9000 (dflt 1500) to improve 10GbE network speed (200%) 2019-02 +5 nodes 2019-05-14 set overcommit_memory=2 to avoid linux crashes on memory pressure 2019-06-12 power loss due to short during work on the room electric 2020-03-03 upgrade of 1GbE to 10GbE for the ssh-access 2020-10-14 system reconfiguration due to instable 10GbE in progress 2021-06-15 slurm partition reconfiguration (about one week testing) 2021-07-07 network and config problems after system update 2021-08-09 remote shutdown tests for 2021-08-16 2021-08-16 planned maintenance, no clima, 12h down time 2022-02-23 quota_all added, showing user quotas 2022-05-16 downtime due to slurm security update 2023-02-13 node[08,16] down (node16 CE-ECC CPU1-G2, node08 no SEL) 2023-06-21 fix default routing issues on node01 + node03 2023-06-21 firewall outgoing worldwide traffic, ask the admin on need (security) 2023-06-21 dbus.service and rpcbind deactivated at nodes (security, stability) 2023-11-08 cluster down for maintenance, network upgrade, node01 memory 1TB estimated downtime 24 to 48 hours 2023-11-09 further 24h downtime to fix performance and software issues 2023-11-20 investigate instability of node01; instability of node02 and node08 was found as memory problem and fixed node13-17 have corrupt SEL entries "08/31/2018 02:20:10 Unknown #0xff" between other entries from after 2020-01 2023-11-22 instability of node08 partly reappeared, like node01, (do not power on after off/reset until DIMM places changed, no error logs, sometimes hanging at DXE--CPU Initialization, also with removed 10G+100Gb-cards), happens with one DIMM too, sometimes every second reset fails, not DIMM dependend, btw. BMC works without any DIMM installed, must have its own local memory 2023-12-12 node01 1TB memory available (sbatch -N1 --mem 960000 ...) 2024-04-23 SMT reenabled (+30% speed, but do not use for MPI) 2024-09-03 19:42 node01 linux kernel panic/crashed 2024-10-09 15:44 node01 linux crashed, reboot + memory-testing 2024-10-17 node01: found single-bit-errors on a 64GB-DDR4-DIMM, DIMM removed, still crashes (more frequently, c-state-related?), testing minfreq=1800MHz instead of lowest 1200MHz to have higher CPU-core voltage (still stable after 13 days, CPU aged?) 2024-10-29 node01: replacement 64GB-DDR4-DIMM 2024-11-06 new Slurm.SelectType=select/cons_res, allow shared resources