Computational Cluster and Research Computing - Technology Help · Technology Help

What is the HPC hardware ecosystem at Lafayette?

Members of the Lafayette community have access to Firebird, the College’s high-performance computing cluster. The Firebird HPC cluster comprises:

A head/login node with dual Intel 32-core Xeon Gold 6530 (Emerald Rapids) 2.1 GHz processors, 256 GB memory, and 98 TB storage

Compute nodes

One compute node, with dual Intel 10-core Xeon Gold 5215 (Cascade Lake) 2.1 GHz processors, 192 GB memory, 22 TB scratch
Three compute nodes, each with dual Intel 26-core Xeon Gold 6230R (Cascade Lake) 2.1 GHz processors, 192 GB memory, and 830 GB scratch
Three compute nodes, each with dual Intel 20-core Xeon Gold 6230 (Skylake) 2.1 GHz processors, 192 GB memory, and 830 GB scratch
Three compute nodes, each with dual Intel 20-core Xeon Gold 6230 (Skylake) 2.1 GHz processors, 384 GB memory, and 830 GB scratch
Six compute nodes, each with dual Intel 32-core Xeon Gold 6338 (Ice Lake) 2.0 GHz processors, 512 GB memory, and 1.7 TB scratch
Three compute nodes, each with dual Intel 32-core Xeon Gold 6430 (Sapphire Rapids) 2.1 GHz processors, 512 GB memory, 7 TB scratch
Thirteen compute nodes, each with dual Intel 32-core Xeon Gold 6530 (Emerald Rapids) 2.1 GHz processors, 512 GB memory, 1.9 TB scratch
One compute node, with dual Intel 32-core Xeon Gold 6530p (Granite Rapids) 2.3 GHz processors, 1 TB memory, 1.7 TB scratch

High Memory nodes

One high-memory node, with dual Intel 18-core Xeon Gold 6240 (Skylake) 2.6 GHz processors, 768 GB memory, and 830 GB scratch
Three high-memory nodes, with dual Intel 20-core Xeon Gold 6230 (Skylake) 2.1 GHz processors, 768 GB memory, and 830 GB scratch
One high-memory node, with dual Intel 32-core Xeon Gold 6530 (Emerald Rapids) 2.1 GHz processors, 2 TB memory, 1.7 TB scratch
One high-memory node, with dual Intel 32-core Xeon Gold 6530P (Emerald Rapids) 2.3 GHz processors, 2 TB memory, 1.7 TB scratch

High CPU nodes

One high-CPU node, with dual Intel 36-core Intel Xeon Platinum 8360Y (Ice Lake-SP) 2.4G Hz, with 1 TB memory, and 1.7 TB scratch
One high-CPU node, with dual AMD 96-core EPYC 9654 (Genoa) 2.4 GHz (3.7GHz Max Boost), 76 8GB memory, and 1.7 TB scratch

GPU nodes

Two GPU nodes, with dual Intel 20-core Xeon Gold 6230 (Skylake) 2.1 GHz processors, 384 GB memory, 830 GB scratch, four NVIDIA RTX 2080 Ti Turing single-precision GPUs, with 11 GB GDDR6 memory, providing 4,352 CUDA Parallel-Processing Cores, 544 Tensor Cores, and 68 RT Cores
One GPU node, with dual Intel 16-core Xeon Gold 6226R (Cascade Lake) 2.9 GHz processors, 192 GB memory, 1.7 TB scratch, three NVIDIA RTX 8000 Turing single-precision GPUs, with 48GB GDDR6 memory, providing 4,608 CUDA Parallel-Processing Cores, 576 Tensor Cores, and 72 RT Cores
One GPU node, with dual Intel 32-core Xeon Gold 6530 (Emerald Rapids) 2.1 GHz processors, 512 GB memory, 7 TB scratch, four NVIDIA L40S Ada Lovelace single-precision GPUs, with 48GB GDDR6 memory, providing 18,176 CUDA Parallel-Processing Cores, 568 Tensor Cores, and 142 RT Cores
One GPU node, with dual Intel 32-core Xeon Gold 6530 (Emerald Rapids) 2.1 GHz processors, 1 TB memory, 1.7 TB scratch, two NVIDIA H200 Hopper double-precision Tensor Core GPUs, with 141GB HBM3e high bandwidth memory, providing 16,896 CUDA Parallel-Processing Cores and 528 Tensor Cores
One GPU node, with dual AMD 32-core, AMD Epyc 9335 3.00 GHz processors, 768 GB memory, 1.7 TB scratch, two NVIDIA RTX PRO 6000 Blackwell Server Ed. double-precision GPUs, with 96 GB GDDR7 memory, providing 24,064 CUDA Parallel-Processing Cores and 752 Tensor Cores

Storage nodes

One NFS-based node, providing 350TB storage
One NFS-based node, providing 66TB storage
One BeeGFS-based node, providing 786TB storage

Interconnect

All nodes are connected at 25GB/s (100Gbps per direction) by an Omni-Path (OPA) fabric

Cluster Scheduling and Resource Mgmt.

Slurm is used for resource allocation and job scheduling

Operating System

All nodes run Rocky Linux version 9.6

In addition to Firebird, the R-HPC team offers consultation services to review and facilitate your research and teaching needs so that appropriate recommendations can be made to provide resources that meet your research and educational needs.

How are Slurm jobs prioritized?

In situations where insufficient computational resources are available to handle all pending jobs. The Slurm resource management software utilizes the “fair-share” algorithm to determine a researcher’s priority.

Essentially, based on the volume of computational resources your jobs will have recently leveraged, your priority for Firebird’s shared resources is determined. Heavier use gradually reduces priority, allowing the jobs of others to be placed earlier in the job queue than those who have used a greater amount of resources.

Additional information about job prioritization and the use of Slurm can be found on our Slurm help page.

Can I purchase computational nodes to which I would have exclusive access?

The compute nodes that comprise the Firebird cluster are available for general, shared use.

If you need dedicated access to resources, e.g., as part of a grant or with startup funds. It is possible to provide you and members of your research lab, department, etc., with priority access through Slurm. You, your students, and collaborators can then preempt workloads utilizing your dedicated resources.

In such cases, during times when your portion of the cluster is unused, those resources would be available for general use by the broader Firebird community.

What about research computing or custom-built systems?

While the Firebird HPC cluster is suitable for many research and teaching use cases. In certain instances, other solutions may be necessary. The R-HPC team is available to consult on your individual needs.

If you have specialized computational requirements, i.e., a dedicated system with multiple GPUs or other resources to which you need ongoing exclusive access. In many cases, we can install such systems in our colocation facility. Doing so can provide benefits such as redundant power, cooling, secure access, and data backup services; in some cases, we might be able to assist with system-level management (e.g., patching, user management, etc.) so that you can concentrate on conducting your research rather than system administration.
In certain cases, workstations or servers may need to be located in labs or other spaces. Depending on their configuration, we may still be able to assist with certain system administration tasks.