What is the HPC hardware ecosystem at Lafayette?
The current computational cluster includes:
- A head/login node with dual Intel 10-core Xeon Gold 5215 (Skylake) 2.5GHz processors, 192GB of memory, and 24TB of storage
- Three compute nodes, each with dual Intel 20-core Xeon Gold 6230 (Skylake) 2.1GHz processors (for a total of 120 cores), 192GB of memory, and 1TB of disk space
- Three compute nodes, each with dual Intel 26-core Xeon Gold 6230R (Cascade Lake) 2.1GHz processors (for a total of 156 cores), 192GB of memory, and 1TB of disk space
- One high-memory compute node, with dual Intel 18-core Xeon Gold 6240 (Skylake) 2.6GHz processors (for a total of 36 cores), 768GB of memory, and 1TB of disk space
- One Graphics Processing Unit (GPU) node, with dual Intel 16-core Xeon Gold 6226R (Cascade Lake) 2.9GHz processors, 192GB of memory, and 2TB of disk space
- Three NVIDIA Quadro RTX8000 single-precision GPUs, with 48GB of GDDR6 memory, providing 4,608 CUDA Parallel-Processing Cores, 576 NVIDIA Tensor Cores, and 72 NVIDIA RT Cores
- One NFS node, managing a 60 drive JBOD providing 350TB of storage
- All nodes are connected by an EDR InfiniBand (100Gb/sec) network
- SLURM is used for resource allocation and job scheduling
- All nodes run Rocky Linux version 8
In addition, we offer some ability to stand up virtual machines (VMs) with various configurations for research and teaching needs.
How are Slurm jobs prioritized?
In situations where insufficient computational resources (e.g., cores, memory, etc.) are available to handle all pending jobs, Slurm relies on a “fair-share” algorithm to determine priority. Essentially, if you have not used many computational resources recently, you will have an earlier queue position than another user who has used a greater amount of resources. Additional information about job prioritization and using Slurm can be found on our Slurm help page.
Can I purchase computational nodes on the cluster to which I have exclusive access?
In general, nodes that comprise the computational cluster are available for general use. If you would like dedicated access to resources purchased, e.g., as part of a grant or with startup funds, it is possible to provide you and any other relevant users (e.g., your research lab, department, etc.) priority access through Slurm that can preempt existing and subsequent requests for those resources. In such cases, during times when your portion of the cluster is unused, those resources would be available for general use.
What about research computing or custom-built systems?
While the computational cluster and VMs are suitable for many research and teaching use cases, in certain instances other solutions may be necessary. The Research and High-Performance Computing team is always available to consult on your individual needs.
- If you require specialized high-performance computational systems, such as a dedicated system with multiple GPUs or other resources to which you need ongoing exclusive access, in many cases we can install such systems in our colocation facility. Doing so can provide benefits such as redundant power, appropriate cooling, secure access, and data backup services; in some cases, we might be able to assist with system-level management (e.g., OS patches, user management, etc.) so that you can concentrate on conducting your research rather than system administration.
- In certain cases, workstations or servers many need to be located in labs or other spaces. Depending on their configuration, we still may be able to assist with certain system administration tasks.