A wide variety of software packages are available on the HPC cluster. Instructions for specific packages are below, but you must take certain basic steps before using any of them.

It is assumed that you are familiar with the basic instructions for connecting to the cluster, and that prior to running any software you have spawned an interactive shell. Once you have allocated an interactive shell session through Slurm, you are ready to use available software. Note that some software might have special instructions for starting the interactive shell; if so, they will be discussed below within the section for each software package.

Working with files

Each individual compute node has a local “scratch volume” mounted to /scratch providing approximately 850gb of working space. When possible, you should perform most actual work within /scratch rather than your home directory. This is for two reasons: first, /scratch is on a physical drive local to each compute node and will therefore perform much faster than working from your home directory, which is mounted over the network; and second, you won’t clutter your home directory with files.

Since /scratch is shared space, the amount of actual available space on a given node may vary; for this reason, please clean up and delete your temporary files from /scratch when finished, and of course move any results or other output files back to your home directory. Removing files from /scratch as jobs are completed can be automated and further details for doing so are provided in the Slurm Tutorial.

Note that we run a script periodically to clean out files older than a few days from /scratch, but please do your part to clean up after yourself!

Working with modules

Like most clusters, we offer multiple software packages, and it can be challenging or even impossible to define a single environment that facilitates access to all software packages at the same time. The standard solution to this problem is to use module files that modify your particular user environment in real time, allowing you to enable or disable various software packages “on the fly.”

If a particular module is required to use a specific software package, it will be mentioned in the section below that pertains to each individual package. But, here are some various generic commands to display, load, and unload available modules:

# display available modules
module avail

# load a specific module
module load <module>

# show currently loaded modules
module list

# unload a specific module
module unload <module>

# unload all modules
module purge

Amber

Amber is a collection of several programs used to carry out molecular dynamics simulations, especially on biomolecules.

  • If you want to use any graphical programs, connect to the cluster with ssh -Y hpc.lafayette.edu (note that this requires a working X server on your local system as described in the basic connection instructions), or if you are only using command-line programs you can connect with ssh hpc.lafayette.edu.
  • Once connected, you must load the appropriate module: module load amber/18
  • As with other programs, many analyses will be faster if you are working within /scratch on a compute node, rather than within your home directory. Just make sure to copy any working files and output back to your home directory when finished!

In addition to the normal serial versions of Amber programs, MPI parallel versions of certain programs are available as well (specifically, cpptraj.MPI, mdgx.MPI, pmemd.MPI, sander.LES.MPI, and sander.MPI). If you are unfamiliar with running parallel versions of these programs, the recommendation is simply not to, and to use their standard serial counterparts instead. Unless your MPI code has been thoroughly tested, you may experience unpredictable results or crashes. If you do want to experiment with them, you need to load the appropriate MPI module prior to doing so: module load mvapich2-2.3.1/intel

Python 2 and 3 / Anaconda

Beginning with Red Hat / CentOS 8, Python is now provided as two separate executables: python2 and python3. Given that Python 2 has been officially discontinued as of January 2020, it is generally recommended that Python 3 be used whenever possible.

If your code calls Python, it may have to be updated to call python3 explicitly, since there is no longer simply a python executable. You could, however, create a symlink in your home directory pointing python at python3, for example.

You are free to request that modules required by your code be installed on the system, or you can install them with pip3 into your home directory. In addition, you are free to install Anaconda in your home directory if you want even greater control over multiple Python environments or projects.

Jupyter Notebook

Jupyter Notebook is a web-based interactive computational environment that enables users to author documents to work with code in some 40 different languages, narrative text, and LaTeX equations, for the purpose of simulation, modeling and visualization.

To utilize Jupyter Notebook on the cluster, it is recommended that Open OnDemand (OOD) is used accessed it.  Assistance with this can be requested via help@lafayette.edu.

Gaussian

Gaussian is a powerful molecular dynamics software package. To be able to run Gaussian, you must be a member of the appropriate group (send an email to help@lafayette.edu asking to be added). Once access is available, you can enable Gaussian to launch with the following commands:

module load gaussian/16
. $g16root/g16/bsd/g16.profile

Then you can launch the appropriate command. These same steps can be integrated easily into a Slurm script for batch processing.

Mathematica

Mathematica is available through both the command line and as a graphical application. Mathematica has certain operations that are optimized for parallel computing, and as such it can take advantage of multiple cores.

  • If you want to use any graphical programs, connect to the cluster with ssh -Y hpc.lafayette.edu (note that this requires a working X server on your local system as described in the basic connection instructions), or if you are only using command-line programs you can connect with ssh hpc.lafayette.edu.
  • Start an interactive shell with, e.g., srun -t 120 -c 8 --mem=16gb --x11 --pty /bin/bash
    • the -t flag takes time by default in minutes and defines the maximum length of your session, so if you intend to work for four hours, you would set it to 240. Your session and work will terminate at the end of this time, so set it reasonably!
    • The -c flag defines the number of available computational cores. Each standard computational node offers 2 CPUs, each with 20 or 26 cores. While “mileage may vary,” it’s unlikely you will see much performance gain beyond 8-12 cores. Requesting more than 20-26 cores would require spanning across two CPUs, further reducing efficiency. It’s important to test your code to determine the most efficient allocation of cores, since setting it arbitrarily high may negatively affect your code’s performance.
    • You can adjust the value of --mem as well to whatever is reasonable for your work (e.g., 32gb, 64gb, etc.). The default computational nodes each have up to 192gb of memory, and if you require more than that, you can connect to the high-memory node (by specifying the --partition flag), which offers up to 768gb of memory: srun -t 240 -c 8 --mem=256gb --partition=himem --x11 --pty /bin/bash (but note that the high-memory node offers up to 36 cores – 18 per CPU, so you may have to adjust your -c value if it is particularly high)
  • You must load the appropriate module: module load mathematica/12.1
  • You can now launch Mathematica:
    • For the command-line version: math
    • For the graphical version (assuming you have an installed X server on your local system -and- you connected via ssh -Y): mathematica
  • Remember, as noted above, your analyses will be much faster if you work within the /scratch directory! If you need assistance with developing a workflow, request assistance via help@lafayette.edu.
  • Mathematica also offers WolframScript, which is ideal for processing Mathematica workflows in batch. These scripts would typically not be run interactively, instead being submitted to run unattended via Slurm.

MATLAB

MATLAB 2020a is installed on the cluster. You can run it with or without the graphical interface.

If you wish to use the graphical version, you must connect to the cluster with ssh -Y hpc.lafayette.edu and have a working X server on your local system. In addition, you must enable X forwarding from the compute nodes by specifying the --x11 flag in your Slurm script. Here is a sample set of commands (which you likely should modify for your own needs):

srun -t 240 --mem=32gb --x11 --pty /bin/bash
module load MATLAB/2020a
matlab

To launch the command line version, you can use a similar approach (or you can also call MATLAB directly from, e.g., batch processing scripts):

srun -t 240 --mem=32gb --pty /bin/bash
module load MATLAB/2020a
matlab -nojvm -nodisplay -nosplash

The following toolboxes are installed by default:

  • Communications
  • Deep Learning
  • DSP System
  • Embedded Coder
  • Global Optimization
  • Image Acquisition
  • Image Processing
  • MATLAB Coder
  • Navigation
  • Optimization
  • ROS
  • Robotics System
  • Signal Processing
  • Simscape
  • Simscape Electrical
  • Simulink Coder
  • Simulink Design Optimization
  • Statistics and Machine Learning
  • Text Analytics
  • Wavelet

If you would like additional toolboxes installed, please send your request to help@lafayette.edu.

R

R is available through the regular system command R, which will launch R in interactive command-line mode.

If instead you want to process existing R script files, or submit one or more scripts as batch jobs, you can use the aptly-named Rscript command, for which you can find documentation here and here. You can also use the source() command within an interactive session.

As always, it will generally be faster to copy your files into /scratch and process them there, which will be much faster than working with them in your home directory when connected to a compute node.

RStudio

RStudio provides a graphical environment for working with R and is available across the cluster.

  • If you want to use any graphical programs, connect to the cluster with ssh -Y hpc.lafayette.edu (note that this requires a working X server on your local system as described in the basic connection instructions), or if you are only using command-line programs you can connect with ssh hpc.lafayette.edu.

Note that when connecting to a compute node via srun for an interactive session through Slurm, add the --x11 flag.

RStudio defaults, however, to hardware rendering, meaning that the first time you attempt to launch rstudio, you will likely encounter a blank white screen or other similar failure. Once you do attempt to launch it, however, it will create a config file with your home directory that can be modified to force software rendering.

You will need to edit .config/RStudio/desktop.ini and add the following lines to the top:

[General]
desktop.renderingEngine=software
font.fixedWidth=DejaVu Sans Mono
general.disableGpuDriverBugWorkarounds=false
general.ignoreGpuBlacklist=false

Once done, you should be able to launch rstudio successfully.

SageMath

SageMath (or Sage for short) is an open-source software package that leverages the Python programming language.  Sage v9.4  is installed on the HPC cluster.

As installed, Sage can be run as standalone software, or it can be passed Python (.py) files.

To launch the command-line version:

module load sage
sage

Stata/MP

Stata/MP can leverage up to 8 computational cores and is available both from the command line and through a graphical interface. Stata/MP is nice because its routines are designed to take advantage of multiple cores automatically, without the need for special syntax.

  • If you want to use any graphical programs, connect to the cluster with ssh -Y hpc.lafayette.edu (note that this requires a working X server on your local system as described in the basic connection instructions), or if you are only using command-line programs you can connect with ssh hpc.lafayette.edu.
  • Start an interactive shell with, e.g., srun -t 120 -c 8 --mem=16gb --x11 --pty /bin/bash
    • the -t flag takes time by default in minutes and defines the maximum length of your session, so if you intend to work for four hours, you would set it to 240. Your session will terminate at the end of this time, so set it reasonably!
    • The -c flag defines the number of available computational cores. Our license permits up to 8, and since Stata’s routines can automatically leverage them, there is little reason not to allocate them to your session.
    • You can adjust the value of --mem as well to whatever is reasonable for your work (e.g., 32gb, 64gb, etc.). The default computational nodes each have up to 192gb of memory, and if you require more than that, you can connect to the high-memory node (by specifying the --partition flag), which offers up to 768gb of memory:srun -t 240 -c 8 --mem=256gb --partition=himem --x11 --pty /bin/bash
    • The --x11 flag enables you to launch the graphical version, should you want to. This flag is not need if you are only running the command-line version.
  • You must load the appropriate module: module load stata/17
  • You can now launch Stata:
    • For the command-line version: stata-mp
    • For the graphical version: xstata-mp
  • Remember, your analyses will be much faster if you work within the /scratch directory! If you need assistance with developing a workflow, request assistance via help@lafayette.edu.
  • It is also possible to run Stata in batch mode by processing files non-interactively. More information can be found on Stata’s FAQ, but essentially it is a matter of: stata -b do filename &

VMD

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.

To launch the command-line version:

module load vmd/1.9.3
vmd -dispdev text

To launch the graphical version, assuming you have a running X server on your local system and are connected to the cluster with ssh -Y:

module load vmd/1.9.3
vmd

One GPU node is available on the cluster, as detailed on the HPC hardware ecosystem. This resource will be particularly useful when working with the graphical version of VMD, particularly for complex renderings.

Tagged in: