A wide variety of software packages are available on the Firebird HPC cluster. Instructions for specific packages are below, but you must take certain basic steps before using any of them.

It is assumed that you are familiar with the basic instructions for connecting to the cluster, and that before running any software you have spawned an interactive shell running on a compute node. Once you have a Slurm-allocated interactive shell session , you are ready to use an available software package. Note that some software might have special instructions for starting the interactive shell; if so, they will be discussed below within the section for each software package.

Working with files

Each individual compute node has a local “scratch volume” mounted to /scratch. The available working space is dependent upon the particular working node and the workloads it is running or has recently completed. When possible, you should perform actual work within /scratch rather than your home directory. This is for two reasons: first, /scratch is local to each compute node and will therefore perform much faster than working from your home directory, which is mounted over the cluster’s network; and second, you won’t clutter your home directory with working files and directories.

To review the Firebird HPC cluster’s hardware ecosystem: Firebird HPC hardware ecosystem

Since /scratch is shared space, the amount of actual available space on a given node may vary; for this reason, please clean up and delete your temporary files from /scratch when workloads complete, and of course move any results or other needed files back to your home directory. Removing files from /scratch as jobs are completed can be automated and further details for doing so are provided in the Slurm Tutorial.

Note that we run a script periodically to clean up the temporary files from /scratch that are older than a few days, but please do your part to clean up after yourself!

Working with modules

We offer multiple software packages, and it can be challenging or even impossible to define a single environment that facilitates access to all software packages at the same time. The standard solution to this problem is to use module files that modify your session’s environment in real-time, allowing you to enable or disable various software packages “on the fly.”

If a particular module is required to use a specific software package, it will be mentioned in the section below that pertains to each package. But, here are some various generic commands to display, load, and unload available modules:

# display available modules
module avail

# load a specific module
module load <module>

# show currently loaded modules
module list

# unload a specific module
module unload <module>

# unload all modules
module purge

Amber

Amber is a collection of several programs used to carry out molecular dynamics simulations, especially on biomolecules.

  • To use any graphical programs, ensure the -Y is included when connecting to the cluster. Note that this requires a working X server on your local system as described in the basic connection instructions). If you’ll only be using command-line programs, you can connect with ssh netID-laf@firebird.lafayette.edu.
  • Once connected, you must load the appropriate module: module load amber/xx editing [xx] for the major release value, i.e. 24.
  • As with many software packages, most analyses will perform faster when they are directed to work within the local /scratch mount on a compute node. Be sure to copy any working files and output back to your home directory when finished!

In addition to the normal serial versions of Amber programs, MPI parallel versions of certain programs are available as well (specifically, cpptraj.MPI, mdgx.MPI, pmemd.MPI, sander.LES.MPI, and sander.MPI). If you are unfamiliar with running parallel versions of these programs, the recommendation is simply not to, and to use their standard serial counterparts instead. Unless your MPI code has been thoroughly tested, you may experience unpredictable results or crashes. If you do want to experiment with them, you need to load the appropriate MPI module prior to doing so: module load module load mvapich2/2.3.7

Python 2 and 3 / Anaconda

Beginning with EL8, Python has been provided as two separate executables: python2 and python3. Given that Python 2 has been officially discontinued as of January 2020, it is generally recommended that Python 3 be used whenever possible.

If your code calls Python, it may have to be updated to call python3 explicitly, since there is no longer simply a “python” executable. You could, however, create a symlink in your home directory pointing python at python3, for example.

You are free to request that modules required by your code be installed on the system, or you can install them with pip3 into your home directory. In addition, you are free to install Anaconda in your home directory if you want even greater control over multiple Python environments or projects.

Jupyter Notebook

Jupyter Notebook is a web-based interactive computational environment that enables users to author documents to work with code in some 40 different languages, narrative text, and LaTeX equations, for the purpose of simulation, modeling and visualization.

To utilize Jupyter Notebook on the cluster, it is recommended that Open OnDemand (OOD) is used accessed it. Assistance with this can be requested via help@lafayette.edu.

Gaussian

Gaussian is a powerful molecular dynamics software package. To run Gaussian, you must be provided with the appropriate permission.  If you lack this permission, please email help@lafayette.edu requesting Gaussian access on the Firebird HPC cluster. Once the prerequisite access is available, Gaussian can be enabled by loading the gaussian module with the following commands:

module load gaussian/16
. $g16root/g16/bsd/g16.profile

With the module loaded, you’ll be able to execute Gaussian commands. Integrating these same steps into a Slurm job script will facilitate the submission of Gaussian batch processes.

Mathematica

Mathematica is available through both the command line and as a graphical application. Mathematica has certain operations that are optimized for parallel computing, and as such it can take advantage of multiple cores.

  • To use any graphical programs, ensure the -Y is included when connecting to the cluster. Note that this requires a working X server on your local system as described in the basic connection instructions). If you’ll only be using command-line programs, you can connect with ssh netID-laf@firebird.lafayette.edu.
  • Start an interactive shell with, e.g., srun -t 120 -c 8 --mem=16gb --x11 --pty /bin/bash
    • the -t flag takes time by default in minutes and defines the maximum length of your session, so if you intend to work for four hours, you would set it to 240. Your session and work will terminate at the end of this time, so set it reasonably!
    • The -c flag defines the number of available computational cores. Each standard computational node offers 2 CPUs, each with 20 or 26 cores. While “mileage may vary,” it’s unlikely you will see much performance gain beyond 8-12 cores. Requesting more than 20-26 cores would require spanning across two CPUs, further reducing efficiency. It’s important to test your code to determine the most efficient allocation of cores, since setting it arbitrarily high may negatively affect your code’s performance.
    • You can adjust the value of --mem as well to whatever is reasonable for your work (e.g., 32gb, 64gb, etc.). The default computational nodes each have up to 192gb of memory, and if you require more than that, you can connect to the high-memory node (by specifying the --partition flag), which offers up to 768gb of memory: srun -t 240 -c 8 --mem=256gb --partition=himem --x11 --pty /bin/bash (but note that the high-memory node offers up to 36 cores – 18 per CPU, so you may have to adjust your -c value if it is particularly high)
  • You must load the appropriate module: module load mathematica/14
  • You can now launch Mathematica:
    • For the command-line version: math
    • For the graphical version (as explained), you must have an X server installd and running on your local system -and- have connected via ssh -Y: mathematica
  • Remember, as noted above, your analyses will be much faster if you work within the /scratch directory! If you need assistance with developing a workflow, request assistance via help@lafayette.edu.
  • Mathematica also offers WolframScript, which is ideal for processing Mathematica workflows in batch. These scripts would typically not be run interactively, instead being submitted to run unattended via Slurm.

MATLAB

MATLAB 2024a is installed on Firebird HPC cluster. You can run it with or without the graphical interface.

  • To use any graphical programs, ensure the -Y is included when connecting to the cluster. Note that this requires a working X server on your local system as described in the basic connection instructions). If you’ll only be using command-line programs, you can connect with ssh netID-laf@firebird.lafayette.edu.
  • In addition, you must enable X forwarding from the compute nodes by specifying the --x11 flag in your Slurm script. Here is a sample set of commands (which you should modify to meet your needs):
    srun -t 240 --mem=32gb --x11 --pty /bin/bash
    module load MATLAB/2024a
    matlab
  • To launch the command line version, you can use a similar approach (or you can also call MATLAB directly via batch processing scripts):
    srun -t 240 --mem=32gb --pty /bin/bash
    module load MATLAB/2024a
    matlab -nojvm -nodisplay -nosplash
  • All toolboxes are now installed by default.

R

R is available through the regular system command R, which will launch R in interactive command-line mode.

If instead you want to process existing R script files, or submit one or more scripts as batch jobs, you can use the aptly-named Rscript command, for which you can find documentation here and here. You can also use the source() command within an interactive session.

As always, it will generally be faster to copy your files into /scratch and process them there, which will be much faster than working with them in your home directory when connected to a compute node.

RStudio

RStudio provides a graphical environment for working with R and is available across the cluster.

  • If you want to use any graphical programs, connect to the cluster with ssh -Y hpc.lafayette.edu (note that this requires a working X server on your local system as described in the basic connection instructions), or if you are only using command-line programs you can connect with ssh hpc.lafayette.edu.

Note that when connecting to a compute node via srun for an interactive session through Slurm, add the --x11 flag.

RStudio defaults, however, to hardware rendering, meaning that the first time you attempt to launch rstudio, you will likely encounter a blank white screen or other similar failure. Once you do attempt to launch it, however, it will create a config file with your home directory that can be modified to force software rendering.

You will need to edit .config/RStudio/desktop.ini and add the following lines to the top:

[General]
desktop.renderingEngine=software
font.fixedWidth=DejaVu Sans Mono
general.disableGpuDriverBugWorkarounds=false
general.ignoreGpuBlacklist=false

Once done, you should be able to launch rstudio successfully.

SageMath

SageMath (or Sage for short) is an open-source software package that leverages the Python programming language. Sage v9.4 is installed on the HPC cluster.

As installed, Sage can be run as standalone software, or it can be passed Python (.py) files.

To launch the command-line version:

module load sage
sage

Stata/MP

Stata/MP can leverage up to 8 computational cores and is available both from the command line and through a graphical interface. Stata/MP is nice because its routines are designed to take advantage of multiple cores automatically, without the need for special syntax.

  • If you want to use any graphical programs, connect to the cluster with ssh -Y hpc.lafayette.edu (note that this requires a working X server on your local system as described in the basic connection instructions), or if you are only using command-line programs you can connect with ssh hpc.lafayette.edu.
  • Start an interactive shell with, e.g., srun -t 120 -c 8 --mem=16gb --x11 --pty /bin/bash
    • the -t flag takes time by default in minutes and defines the maximum length of your session, so if you intend to work for four hours, you would set it to 240. Your session will terminate at the end of this time, so set it reasonably!
    • The -c flag defines the number of available computational cores. Our license permits up to 8, and since Stata’s routines can automatically leverage them, there is little reason not to allocate them to your session.
    • You can adjust the value of --mem as well to whatever is reasonable for your work (e.g., 32gb, 64gb, etc.). The default computational nodes each have up to 192gb of memory, and if you require more than that, you can connect to the high-memory node (by specifying the --partition flag), which offers up to 768gb of memory:srun -t 240 -c 8 --mem=256gb --partition=himem --x11 --pty /bin/bash
    • The --x11 flag enables you to launch the graphical version, should you want to. This flag is not need if you are only running the command-line version.
  • You must load the appropriate module: module load stata/17
  • You can now launch Stata:
    • For the command-line version: stata-mp
    • For the graphical version: xstata-mp
  • Remember, your analyses will be much faster if you work within the /scratch directory! If you need assistance with developing a workflow, request assistance via help@lafayette.edu.
  • It is also possible to run Stata in batch mode by processing files non-interactively. More information can be found on Stata’s FAQ, but essentially it is a matter of: stata -b do filename &

VMD

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.

To launch the command-line version:

module load vmd/1.9.3
vmd -dispdev text

To launch the graphical version, assuming you have a running X server on your local system and are connected to the cluster with ssh -Y:

module load vmd/1.9.3
vmd

One GPU node is available on the cluster, as detailed on the HPC hardware ecosystem. This resource will be particularly useful when working with the graphical version of VMD, particularly for complex renderings.

Tagged in: