Skip to content

IMTM HPC Cluster

IMTM operates its own high-performance computing (HPC) facility, designed to handle the demanding needs of cutting-edge research in areas like bioinformatics, chemoinformatics, and genome analysis. Since many of these projects involve sensitive patient data, the entire system is managed solely by IMTM staff—no external vendors or third parties are involved, ensuring maximum data security and privacy.

IMTM Datacenter

Compute Resources

Node class CPU Nodes Cores / node RAM / node
Density AMD EPYC 9654 96-core 12 192 512 GiB
Frequency AMD EPYC 9474F 48-core 5 96 768 GiB

Total capacity 2784 CPU cores and 9.8 TiB RAM memory.

Theoretical Peak FP64 Performance

Clock profile AMD EPYC 9654 nodes 12 AMD EPYC 9474F nodes 5 Whole cluster
Base clocks 88.5 TFLOPs 27.6 TFLOPs 116 TFLOPs
Max boost 136.4 TFLOPs 31.5 TFLOPs 168 TFLOPs

Networking

Every node is connected to our core switches with two bonded 25GbE SFP28 ports, giving 50 Gbit/s of effective bandwidth with RoCE-v2 RDMA support. Aggregate leaf bandwidth 850 Gbit/s across the 17 compute nodes.

The cluster is connected to the internet with 10 Gbit/s uplink and is connected to IT4I supercomputer with 100 Gbit/s dedicated line.

Storage

Tier Drives Raw capacity
SSD NVMe 192 × 7.68 TB 1.48 PB
HDD 128 × 18 TB 2.30 PB

Total raw capacity 3.78 PB in a two-tier SSD and HDD.

Software

Our cluster comes preinstalled with a wide range of scientific and development tools. It uses Lmod, a Lua-based environment module system, to manage and load software modules from the preinstalled library. These modules are built using EasyBuild, ensuring reproducibility, compatibility, and ease of deployment across the system.

Scheduler

The cluster's workload orchestration relies on the latest stable release of OpenPBS, augmented with a suite of custom Python hooks that tailor scheduling to IMTM's research priorities. These hooks enforce fair-share policies, dynamic QoS levels, cgroup-based resource isolation and scratch space management.

Monitoring

Our High Performance Computing (HPC) cluster is actively monitored using Prometheus, which collects real-time metrics on system performance and resource usage. These metrics are visualized through Grafana dashboards, allowing operators to monitor cluster health, identify performance bottlenecks, and make informed decisions regarding resource allocation and workload management.