URZ HPC-Cluster Sofja


rack 6 front, cold side
outside-Sep21, night-view, warm-gangway

Sofja - 288 Nodes Infiniband-Cluster (ca. 800 TFLOPs)

News:

  • Nov2021 - below our switch-on plan (not fixed yet)
    • 24.11.2021 - renaming HPC21 to Sofja
    • 24.11.2021 - 67 active users of t100-hpc copied incl. ssh-authkeys
    • 25.11.2021 - new HPC21/Sofja system available for users
    • 29.11.2021 - switching off old HPC-cluster Dec2015-Nov2021
  • See at History/Timeline for photos of progress

Short description

HPC means "High Performance Computing" (dt.: "Hochleistungsrechnen"). This Cluster "Sofja" is a HPC cluster for universitary sientific use. It is mainly for parallelized applications with high network communication and high memory demand, i.e. things which do not fit on a single workstation. It is based on linux, the job-scheduler slurm and the MPI library for the high-speed network. The HPC-cluster Sofja replaces the older HPC-system Neumann.

Hardware

Architecture: 292 infiniband-connected ccNuma-nodes
Prozessor (CPU): 2 x 16c/32t Ice Lake Xeon 6326 base=2.9GHz max.turbo=3.5GHz(no_avx?) 512-bit-Vector-support (AVX512 2FMA 2.4GHz?) 32 FLOP/clock, 3TFLOP/node (flt64), 8 memory-channels/CPU je 3.2GT/s 205 GB/s/CPU, 185 W/CPU
Board: D50TNP1SB (4 boards per 2HE-chassis)
Main Memory (RAM): 256 Gbytes, 16*16GB-DDR4-3200GT/s-ECC Memory-Bandwidth 410 GB/s/Node (4 fat nodes with 1024GB/Node)
Storage (disks): diskless compute nodes, 5 x BeeGFS nodes with dm-encrypted 3*(8+2 RAID6) * 4TB each, ca. 430TB in summary, ca. 2.4GB/s per storage node, 2022 extended to 10 nodes 870TB,
ior-results: home=2.5GB/s (1oss), scratch=10.6GB/s (9oss,1node), scratch=20.4GB/s (9oss,2nodes)
network: Gigabit-Ethernet (management), HDR/2-Infiniband (100Gb/s) non-blocking
Power consumption: ca. 180kW max. (idle: ca. 54kW)
Performance data: MemStream: 409 GB/s/node, Triad 71%
MPI: 12.3 GB/s/wire (alltoall uniform, best case)
Peak = ... TFLOPs (... FLOP/Word, ... GF/W)

Software

User Access:

Access via ssh sofja.urz.uni-magdeburg.de (141.44.5.38) is only allowed from within the universitary IP-range. As login please use your universitary account name. It is recommended to use ssh-public-keys for passwordless logins. Please send your ssh public key together with a short description of your project, the project time and the GB of storage you probably need at maximum during your project. Students need a formless confirmation of their universitary tutor, that they are allowed to use central HPC resources for science. This machine is not suited for work with personal data. If you use Windows and Excced für for the access (graphical), take a look to the windows/ssh configuration hints. Please note that the HPC storage is not intended for long time data archievement. There is only some hardware redundance (RAID6) and no backup. We explicite do not backups to not reduce performance of the applications. So far you are responsible to safe your data outside the HPC system. Please remove unneeded data to left more space for others. Thanks! For questions and problems please contact the administration via mailto:Joerg.Schulenburg+hpc21(at)URZ.Uni-Magdeburg.DE?subject=hpc-sofja or Tel.58408 (german or english).

Privacy/Security

History/Timeline:

Projects:

This is a incomplete list of projects on this cluster to give you an impression, what the cluster is used for.

Questions and Answers:

Problems:

...

Further HPC-Systems:


more infos to central HPC compute servers at the CMS websites (content management system) or at the fall-back OvGU-HPC overview


Author: Joerg Schulenburg, Uni-Magdeburg URZ, Tel. 58408 (2021-2026)