IMTM HPC Cluster¶
IMTM operates its own high-performance computing (HPC) facility, designed to handle the demanding needs of cutting-edge research in areas like bioinformatics, chemoinformatics, and genome analysis. Since many of these projects involve sensitive patient data, the entire system is managed solely by IMTM staff—no external vendors or third parties are involved, ensuring maximum data security and privacy.

Compute Resources¶
| Node class | CPU | Nodes | Cores / node | RAM / node |
|---|---|---|---|---|
| Density | AMD EPYC 9654 96-core | 12 | 192 | 512 GiB |
| Frequency | AMD EPYC 9474F 48-core | 5 | 96 | 768 GiB |
Total capacity 2784 CPU cores and 9.8 TiB RAM memory.
Theoretical Peak FP64 Performance¶
| Clock profile | AMD EPYC 9654 nodes 12 |
AMD EPYC 9474F nodes 5 |
Whole cluster |
|---|---|---|---|
| Base clocks | 88.5 TFLOPs | 27.6 TFLOPs | 116 TFLOPs |
| Max boost | 136.4 TFLOPs | 31.5 TFLOPs | 168 TFLOPs |
Networking¶
Every node is connected to our core switches with two bonded 25GbE SFP28 ports, giving 50 Gbit/s of effective bandwidth with RoCE-v2 RDMA support. Aggregate leaf bandwidth 850 Gbit/s across the 17 compute nodes.
The cluster is connected to the internet with 10 Gbit/s uplink and is connected to IT4I supercomputer with 100 Gbit/s dedicated line.
Storage¶
| Tier | Drives | Raw capacity |
|---|---|---|
SSD NVMe |
192 × 7.68 TB | 1.48 PB |
| HDD | 128 × 18 TB | 2.30 PB |
Total raw capacity 3.78 PB in a two-tier SSD and HDD.
Software¶
Our cluster comes preinstalled with a wide range of scientific and development tools. It uses Lmod, a Lua-based environment module system, to manage and load software modules from the preinstalled library. These modules are built using EasyBuild, ensuring reproducibility, compatibility, and ease of deployment across the system.
Scheduler¶
The cluster's workload orchestration relies on the latest stable release of OpenPBS, augmented with a suite of custom Python hooks that tailor scheduling to IMTM's research priorities. These hooks enforce fair-share policies, dynamic QoS levels, cgroup-based resource isolation and scratch space management.
Monitoring¶
Our High Performance Computing (HPC) cluster is actively monitored using Prometheus, which collects real-time metrics on system performance and resource usage. These metrics are visualized through Grafana dashboards, allowing operators to monitor cluster health, identify performance bottlenecks, and make informed decisions regarding resource allocation and workload management.