What is the HPC hardware ecosystem at Lafayette?

Members of the Lafayette community have access to Firebird, the College’s high-performance computing cluster.

Firebird quick facts:

  • 1724 compute cores
  • 15 GPUs
  • 20 compute nodes
  • 5 high-memory nodes
  • 2 high-CPU nodes
  • 4 GPU nodes

The Firebird computational cluster comprises:

  • A head/login node with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 256GB memory, and 98TB storage

Compute nodes

  • One compute node, with dual Intel 10-core, 20-thread Xeon Gold 5215 (Cascade Lake) 2.1GHz processors, 192GB memory, 22TB scratch space
  • Three compute nodes, each with dual Intel 26-core, 52-thread Xeon Gold 6230R (Cascade Lake) 2.1GHz processors, 192GB memory, and 830GB scratch space
  • Three compute nodes, each with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 192GB memory, and 830GB scratch space
  • One compute node, with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 192GB memory, and 66TB scratch space
  • Three compute nodes, each with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 384GB memory, and 830GB scratch space
  • Six compute nodes, each with dual Intel 32-core, 64-thread Xeon Gold 6338 (Ice Lake) 2.0GHz processors, 512GB memory, and 1.7TB scratch space
  • Three compute nodes, each with dual Intel 32-core, 64-thread Xeon Gold 6430 (Sapphire Rapids) 2.1GHz processors, 512GB memory, 7TB scratch space

High Memory nodes

  • One high-memory node, with dual Intel 18-core, 36-thread Xeon Gold 6240 (Skylake) 2.6GHz processors, 768GB memory, and 830GB scratch space
  • Three high memory nodes, with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 768GB memory, and 830GB scratch space
  • One high memory node, with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 2TB memory, 1.7TB scratch space

High CPU nodes

  • One high-CPU node, with dual Intel 36-core, 72-thread Intel Xeon Platinum 8360Y (Ice Lake-SP) 2.4GHz, with 1TB memory, and 1.7TB scratch space
  • One high-CPU node, with dual AMD 96-core, 192-thread EPYC 9654 (Genoa) 2.4GHz (3.7GHz Max Boost), 768GB memory, and 1.7TB scratch space

GPU nodes

  • Two GPU nodes, with dual Intel 20-core, 40-thread Xeon Gold 6230 (Skylake) 2.1GHz processors, 384GB memory, 830GB scratch space, four Nvidia RTX 2080 Ti Turing single-precision GPUs, with 11GB GDDR6 memory, providing 4,352 CUDA Parallel-Processing Cores, 544 Tensor Cores, and 68 RT Cores
  • One GPU node, with dual Intel 16-core, 32-thread Xeon Gold 6226R (Cascade Lake) 2.9GHz processors, 192GB memory, 1.7TB scratch space, three Nvidia RTX 8000 Turing single-precision GPUs, with 48GB GDDR6 memory, providing 4,608 CUDA Parallel-Processing Cores, 576 Tensor Cores, and 72 RT Cores
  • One GPU node, with dual Intel 32-core, 64-thread Xeon Gold 6530 (Emerald Rapids) 2.1GHz processors, 512GB memory, 7TB scratch space, fourĀ Nvidia L40S Ada Lovelace double-precision GPUs, with 48GB GDDR6 memory, providing 18,176 CUDA Parallel-Processing Cores, 568 Tensor Cores, and 142 RT Cores

Storage nodes

  • One NFS node, managing a WD Data60 JBOD providing 350TB storage
  • One BeeGFS node, managing a WD Data60 JBOD providing 786TB storage

Interconnect

  • All nodes are connected at 25GB/s (100Gbps per direction) by an Omni-Path (OPA) fabric

Cluster Scheduling and Resource Mgmt.

  • Slurm is used for resource allocation and job scheduling

Operating System

  • All nodes run Rocky Linux version 9

In addition, we offer the ability to stand-up virtual machines with various configurations for research and educational needs.

How are Slurm jobs prioritized?

In situations where insufficient computational resources (e.g., cores, memory, etc.) are available to handle all pending jobs, Slurm relies on the “fair-share” algorithm to determine priority. Essentially, if you have not used many computational resources recently, you will have an earlier queue position than another user who has used a greater amount of resources. Additional information about job prioritization and using Slurm can be found on our Slurm help page.

Can I purchase computational nodes on the cluster to which I have exclusive access?

In general, nodes that comprise the computational cluster are available for general use. If you would like dedicated access to resources purchased, e.g., as part of a grant or with startup funds, it is possible to provide you and any other relevant users (e.g., your research lab, department, etc.) priority access through Slurm that can preempt existing and subsequent requests for those resources. In such cases, during times when your portion of the cluster is unused, those resources would be available for general use.

What about research computing or custom-built systems?

While the computational cluster and VMs are suitable for many research and teaching use cases, in certain instances other solutions may be necessary. The Research and High-Performance Computing team is always available to consult on your individual needs.

  • If you require specialized high-performance computational systems, such as a dedicated system with multiple GPUs or other resources to which you need ongoing exclusive access, in many cases we can install such systems in our colocation facility. Doing so can provide benefits such as redundant power, appropriate cooling, secure access, and data backup services; in some cases, we might be able to assist with system-level management (e.g., OS patches, user management, etc.) so that you can concentrate on conducting your research rather than system administration.
  • In certain cases, workstations or servers many need to be located in labs or other spaces. Depending on their configuration, we still may be able to assist with certain system administration tasks.
Tagged in: