hpc18

This cluster is for small and medium jobs (32 x86_64-cores, 256GB memory). It has excellent memory speed (8xDDR4) but 10GbE between nodes only. Since 2020 it is also for preparation of GPU-jobs. Access is possible after login using ssh (secure shell) to hpc18.urz.uni-magdeburg.de (intra-uni-network only). Operating system is Linux CentOS. This cluster is not suited for personal data.

Hardware:

sample slurm-jobscript:

#!/bin/bash
#   this is first draft (bad cpu pinning is a problem)
#
#SBATCH -J jobname1
#SBATCH -N 2 # Zahl der nodes, 110 GB per, node range: 1..12
#SBATCH --ntasks-per-node 32 # range: 1..32 (max 32 cores per node)
#SBATCH --time 0:59:00 # set 59min walltime
#SBATCH --mem 110000   # please use max. 110GB for better priorisation
#
exec 2>&1      # send errors into stdout stream
echo "DEBUG: SLURM_JOB_NODELIST=$SLURM_JOB_NODELIST"
echo "DEBUG: SLURM_NNODES=$SLURM_NNODES"
echo "DEBUG: SLURM_TASKS_PER_NODE=$SLURM_TASKS_PER_NODE"
#env | grep -e MPI -e SLURM
echo "DEBUG: host=$(hostname) pwd=$(pwd) ulimit=$(ulimit -v) \$1=$1 \$2=$2"
scontrol show Job $SLURM_JOBID  # show slurm-command and more for DBG

module load mpi/openmpi
module list
HOSTFILE=slurm-$SLURM_JOBID.hosts
scontrol show hostnames $SLURM_JOB_NODELIST > $HOSTFILE # one entry per host
awk '{print $1,"slots=32"}' $HOSTFILE > $HOSTFILE.2

echo "DEBUG: taskset= $(taskset -p $$)"
NPERNODE=$SLURM_NTASKS_PER_NODE
if [ -z "$NPERNODE" ];then NPERNODE=32; fi # default 32 mpi-tasks/node
echo "DEBUG: NPERNODE= $NPERNODE"

export OMP_NUM_THREADS=$[32/NPERNODE]
export OMP_WAIT_POLICY="PASSIVE"    # reduces OMP energy consumption

export OMPI_MCA_mpi_yield_when_idle=1  # untested, low energy OMPI ???
export OMPI_MCA_hwloc_base_binding_policy=none # pin-problem work arround???

# default Core-binding is bad, two tasks bound on 2 hyperthreads of same core
# but helps for srun only, not for direct mpirun (no mpi standard!?)
#  obsolete, if hyperthreading is disabled by BIOS
TASKSET="taskset 0xffffffff"  # 2019-01 ok for 1*32t, 32*1t
# try to fix 2 most simple cases here:
if [ $NPERNODE == 32 ];then
export SLURM_CPU_BIND=v,map_cpu:1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
fi
if [ $NPERNODE == 1  ];then
 TASKSET="taskset 0xffffffff" # 32bit bask
fi

#mpirun -np 1 --report-bindings --oversubscribe -v --npernode $NPERNODE bash -c "taskset -p $$;ps aux"

# hybrid binary mpi+multithread
mpirun --report-bindings --oversubscribe -v --npernode $NPERNODE $TASKSET mpi-binary -t$OMP_NUM_THREADS


run with: sbatch jobfile

History/ChangeLog/News

2018-02    installation 12 nodes
2018-04    bad CPU pinning of slurm(?),openmpi(?), workarround script
2018-05    set MTU=9000 (dflt 1500) to improve 10GbE network speed (200%)
2019-02    +5 nodes
2019-05-14 set overcommit_memory=2 to avoid linux crashes on memory pressure
2019-06-12 power loss due to short during work on the room electric
2020-03-03 upgrade of 1GbE to 10GbE for the ssh-access

weitere Infos zu zentralen Compute-Servern im CMS oder im fall-back OvGU-HPC overview