What is the HPC hardware ecosystem at Lafayette?
Members of the Lafayette community have access to Firebird, the College’s high-performance computing cluster. The Firebird HPC cluster comprises:
- A head/login node with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 256GB memory, and 98TB storage
Compute nodes
- One compute node, with dual Intel 10-core, 20-thread Xeon Gold 5215 (Cascade Lake) 2.1GHz processors, 192GB memory, 22TB scratch space
- Three compute nodes, each with dual Intel 26-core, 52-thread Xeon Gold 6230R (Cascade Lake) 2.1GHz processors, 192GB memory, and 830GB scratch space
- Three compute nodes, each with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 192GB memory, and 830GB scratch space
- Three compute nodes, each with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 384GB memory, and 830GB scratch space
- Six compute nodes, each with dual Intel 32-core, 64-thread Xeon Gold 6338 (Ice Lake) 2.0GHz processors, 512GB memory, and 1.7TB scratch space
- Three compute nodes, each with dual Intel 32-core, 64-thread Xeon Gold 6430 (Sapphire Rapids) 2.1GHz processors, 512GB memory, 7TB scratch space
- Thirteen compute nodes, each with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 512GB memory, 1.9TB scratch space
High Memory nodes
- One high-memory node, with dual Intel 18-core, 36-thread Xeon Gold 6240 (Skylake) 2.6GHz processors, 768GB memory, and 830GB scratch space
- Three high-memory nodes, with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 768GB memory, and 830GB scratch space
- One high-memory node, with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 2TB memory, 1.7TB scratch space
- One high-memory node, with dual Intel 32-core, 64-thread Xeon Gold 6530P (Emerald Rapids) 2.3GHz processors, 2TB memory, 1.7TB scratch space
High CPU nodes
- One high-CPU node, with dual Intel 36-core, 72-thread Intel Xeon Platinum 8360Y (Ice Lake-SP) 2.4GHz, with 1TB memory, and 1.7TB scratch space
- One high-CPU node, with dual AMD 96-core, 192-thread EPYC 9654 (Genoa) 2.4GHz (3.7GHz Max Boost), 768GB memory, and 1.7TB scratch space
GPU nodes
- Two GPU nodes, with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 384GB memory, 830GB scratch space, four NVIDIA RTX 2080 Ti Turing single-precision GPUs, with 11GB GDDR6 memory, providing 4,352 CUDA Parallel-Processing Cores, 544 Tensor Cores, and 68 RT Cores
- One GPU node, with dual Intel 16-core, 32-thread Xeon Gold 6226R (Cascade Lake) 2.9GHz processors, 192GB memory, 1.7TB scratch space, three NVIDIA RTX 8000 Turing single-precision GPUs, with 48GB GDDR6 memory, providing 4,608 CUDA Parallel-Processing Cores, 576 Tensor Cores, and 72 RT Cores
- One GPU node, with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 512GB memory, 7TB scratch space, four NVIDIA L40S Ada Lovelace single-precision GPUs, with 48GB GDDR6 memory, providing 18,176 CUDA Parallel-Processing Cores, 568 Tensor Cores, and 142 RT Cores
- One GPU node, with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 1TB memory, 1.7TB scratch space, two NVIDIA H200 Hopper double-precision Tensor Core GPUs, with 141GB HBM3e high bandwidth memory, providing 16,896 CUDA Parallel-Processing Cores and 528 Tensor Cores
Storage nodes
- One NFS-based node, providing 350TB storage
- One NFS-based node, providing 66TB storage
- One BeeGFS-based node, providing 786TB storage
Interconnect
- All nodes are connected at 25GB/s (100Gbps per direction) by an Omni-Path (OPA) fabric
Cluster Scheduling and Resource Mgmt.
- Slurm is used for resource allocation and job scheduling
Operating System
- All nodes run Rocky Linux version 9.6
In addition to Firebird, the R-HPC team offers consultation services to review and facilitate your research and teaching needs so that appropriate recommendations can be made to provide resources that meet your research and educational needs.
How are Slurm jobs prioritized?
In situations where insufficient computational resources are available to handle all pending jobs. The Slurm resource management software utilizes the “fair-share” algorithm to determine a researcher’s priority.
Essentially, based on the volume of computational resources your jobs will have recently leveraged, your priority for Firebird’s shared resources is determined. Heavier use gradually reduces priority, allowing the jobs of others to be placed earlier in the job queue than those who have used a greater amount of resources.
Additional information about job prioritization and the use of Slurm can be found on our Slurm help page.
Can I purchase computational nodes to which I would have exclusive access?
The compute nodes that comprise the Firebird cluster are available for general, shared use.
If you need dedicated access to resources, e.g., as part of a grant or with startup funds. It is possible to provide you and members of your research lab, department, etc., with priority access through Slurm. You, your students, and collaborators can then preempt workloads utilizing your dedicated resources.
In such cases, during times when your portion of the cluster is unused, those resources would be available for general use by the broader Firebird community.
What about research computing or custom-built systems?
While the Firebird HPC cluster is suitable for many research and teaching use cases. In certain instances, other solutions may be necessary. The R-HPC team is available to consult on your individual needs.
- If you have specialized computational requirements, i.e., a dedicated system with multiple GPUs or other resources to which you need ongoing exclusive access. In many cases, we can install such systems in our colocation facility. Doing so can provide benefits such as redundant power, cooling, secure access, and data backup services; in some cases, we might be able to assist with system-level management (e.g., patching, user management, etc.) so that you can concentrate on conducting your research rather than system administration.
- In certain cases, workstations or servers may need to be located in labs or other spaces. Depending on their configuration, we may still be able to assist with certain system administration tasks.