A wide variety of software packages are available on the computational cluster. Instructions for specific packages are below, but you must take certain basic steps before using any of them.

It is assumed that you are familiar with the basic instructions to connect to the cluster, and that prior to running any software you have spawned an interactive shell. Once you have allocated an interactive shell session through SLURM, you are ready to use available software. Note that some software might have special instructions for starting the interactive shell; if so, they will be discussed below within the section for each software package.

Working with files

Each computational node has roughly 850gb of local space mounted to /scratch. When possible, you should perform most actual work within /scratch rather than your home directory. This is for two reasons: first, /scratch is on a physical drive local to each computational node and will therefore perform much faster than working from your home directory, which is mounted over the network; and second, you won’t clutter your home directory with files. Note that because /scratch is shared space, the amount of actual available space on a given node may vary; for this reason, please clean up and delete your temporary files from /scratch when finished, and of course move any results or other output back into your home directory. We run a script periodically that cleans out files older than a few days from /scratch, so please do your part to clean up after yourself!

Working with modules

Like most clusters, we offer multiple software packages, and it can be challenging or even impossible to define a single environment that facilitates access to all software packages at the same time. The standard solution to this problem is to use module files that modify your particular user environment in real time, allowing you to enable or disable various software packages “on the fly.”

If a particular module is required to use a specific software package, it will be mentioned in the section below that pertains to each individual package. But, here are some various generic commands to display, load, and unload available modules:

# display available modules
module avail

# load a specific module
module load <module>

# show currently loaded modules
module list

# unload a specific module
module unload <module>

# unload all modules
module purge

Amber

Amber is a collection of several programs used to carry out molecular dynamics simulations, especially on biomolecules.

  • If you want to use any graphical programs, connect to the cluster with ssh -Y hpc.lafayette.edu (note that this requires a working X server on your local system as described in the basic connection instructions), or if you are only using command-line programs you can connect with ssh hpc.lafayette.edu.
  • Once connected, you must load the appropriate module: module load amber/18
  • As with other programs, many analyses will be faster if you are working within /scratch on a compute node, rather than within your home directory. Just make sure to copy any working files and output back to your home directory when finished!

In addition to the normal serial versions of Amber programs, MPI parallel versions of certain programs are available as well (specifically, cpptraj.MPI, mdgx.MPI, pmemd.MPI, sander.LES.MPI, and sander.MPI). If you are unfamiliar with running parallel versions of these programs, the recommendation is simply not to, and to use their standard serial counterparts instead. Unless your MPI code has been thoroughly tested, you may experience unpredictable results or crashes. If you do want to experiment with them, you need to load the appropriate MPI module prior to doing so: module load mvapich2-2.3.1/intel

Python 2 and 3 / Anaconda

Beginning with Red Hat / CentOS 8, Python is now provided as two separate executables: python2 and python3. Given that Python 2 has been officially discontinued as of January 2020, it is generally recommended that Python 3 be used whenever possible.

If your code calls Python, it may have to be updated to call python3 explicitly, since there is no longer simply a python executable. You could, however, create a symlink in your home directory pointing python at python3, for example.

You are free to request that modules required by your code be installed on the system, or you can install them with pip3 into your home directory. In addition, you are free to install Anaconda in your home directory if you want even greater control over multiple Python environments or projects.

Gaussian

Gaussian is a powerful molecular dynamics software package. To be able to run Gaussian, you must be a member of the appropriate group (send an email to help@lafayette.edu asking to be added). Once done, you can enable Gaussian to launch with the following commands:

module load gaussian/16
. $g16root/g16/bsd/g16.profile

Then you can launch the appropriate command. These same steps can be integrated easily into a Slurm script for batch processing.

Mathematica

Mathematica is available through both the command line and GUI. Mathematica has certain operations that are optimized for parallel computing, and as such it can take advantage of multiple cores.

  • Connect to the cluster using either ssh hpc.lafayette.edu (if you want to use Mathematica through the command line) or ssh -Y hpc.lafayette.edu (if you want to use the GUI version)
  • Start an interactive shell with, e.g., srun -t 120 -c 8 --mem=16gb --x11 --pty /bin/bash
    • the -t flag takes time by default in minutes and defines the maximum length of your session, so if you intend to work for four hours, you would set it to 240. Your session will terminate at the end of this time, so set it reasonably!
    • The -c flag defines the number of available computational cores. Each standard computational node offers 2 CPUs with 20 cores each (for a total of 40 cores). While “your mileage may vary,” it’s unlikely you will see much performance gain beyond 8-12 cores. More than 20 would have to span across two CPUs, further reducing efficiency. It’s important to test your code to determine the most efficient allocation of cores, since setting it arbitrarily high may negatively affect your code’s performance.
    • You can adjust the value of --mem as well to whatever is reasonable for your work (e.g., 32gb, 64gb, etc.). The default computational nodes each have up to 192gb of memory, and if you require more than that, you can connect to the high-memory node (by specifying the --partition flag), which offers up to 768gb of memory: srun -t 240 -c 8 --mem=256gb --partition=himem --x11 --pty /bin/bash (but note that the high-memory node offers up to 36 cores – 18 per CPU – rather than 40, so you may have to adjust your -c value if it is particularly high)
  • You must load the appropriate module: module load mathematica/12.0
  • You can now launch Mathematica:
    • For the command-line version: math
    • For the GUI version (assuming you have a working X server installed on your local system -and- you connected via ssh -Y): mathematica
  • Remember, as noted above, your analyses will be much faster if you work within the /scratch directory! If you need assistance with developing a workflow, reach out to simmsj@lafayette.edu.
  • Mathematica also offers WolframScript, which is ideal for processing Mathematica workflows in batch. These scripts would typically not be run interactively, but instead submitted to run unattended via SLURM.

MATLAB

MATLAB 2020a is installed on the cluster. You can run it with or without the GUI front-end.

If you wish to use the GUI version, you must connect to the cluster with ssh -Y hpc.lafayette.edu and have a working X server on your local system. In addition, you must enable X forwarding from the compute nodes by specifying the --x11 flag to Slurm. Here is a sample set of commands (which you likely should modify for your own needs):

srun -t 240 --mem=32gb --x11 --pty /bin/bash
module load MATLAB/2020a
matlab

To launch the non-GUI command line version, you can use a similar approach (or you can also call matlab directly from, e.g., batch processing scripts):

srun -t 240 --mem=32gb --pty /bin/bash
module load MATLAB/2020a
matlab -nojvm -nodisplay -nosplash

The following toolboxes are installed by default:

  • Communications
  • Deep Learning
  • DSP System
  • Embedded Coder
  • Global Optimization
  • Image Acquisition
  • Image Processing
  • MATLAB Coder
  • Navigation
  • Optimization
  • ROS
  • Robotics System
  • Signal Processing
  • Simscape
  • Simscape Electrical
  • Simulink Coder
  • Simulink Design Optimization
  • Statistics and Machine Learning
  • Text Analytics
  • Wavelet

If you would like additional toolboxes installed, please send your request to help@lafayette.edu.

R

R is available through the regular system command R, which will launch R in interactive command-line mode.

If instead you want to process existing R script files, or submit one or more scripts as batch jobs, you can use the aptly-named Rscript command, for which you can find documentation here and here. You can also use the source() command within an interactive session.

As always, it will generally be faster to copy your files into /scratch and process them there, which will be much faster than working with them in your home directory when connected to a compute node.

RStudio

RStudio is a GUI for working with R and is available on all nodes of the cluster. Since it is a graphical program, you must connect to the cluster with ssh -Y (and with the --x11 flag when connecting to a compute node via srun for an interactive session through Slurm) in order to forward the X11 commands to your local system. RStudio defaults, however, to hardware rendering, meaning that the first time you attempt to launch rstudio, you will likely encounter a blank white screen or other similar failure. Once you do attempt to launch it, however, it will create a config file with your home directory that can be modified to force software rendering.

You will need to edit .config/RStudio/desktop.ini and add the following lines to the top:

[General]
desktop.renderingEngine=software
font.fixedWidth=DejaVu Sans Mono
general.disableGpuDriverBugWorkarounds=false
general.ignoreGpuBlacklist=false

Once done, you should be able to launch rstudio successfully.

Stata/MP

Stata/MP 16 can leverage up to 8 computational cores and is available both from the command line and through a GUI. Stata/MP is nice because its routines are designed to take advantage of multiple available cores automatically, without the need for special syntax.

  • Connect to the cluster using either ssh hpc.lafayette.edu (if you want to use Stata through the command line) or ssh -Y hpc.lafayette.edu (if you want to use the GUI version)
  • Start an interactive shell with, e.g., srun -t 120 -c 8 --mem=16gb --x11 --pty /bin/bash
    • the -t flag takes time by default in minutes and defines the maximum length of your session, so if you intend to work for four hours, you would set it to 240. Your session will terminate at the end of this time, so set it reasonably!
    • The -c flag defines the number of available computational cores. Our license permits up to 8, and since Stata’s routines can automatically leverage them, there is little reason not to allocate them to your session.
    • You can adjust the value of --mem as well to whatever is reasonable for your work (e.g., 32gb, 64gb, etc.). The default computational nodes each have up to 192gb of memory, and if you require more than that, you can connect to the high-memory node (by specifying the --partition flag), which offers up to 768gb of memory: srun -t 240 -c 8 --mem=256gb --partition=himem -x11 --pty /bin/bash
    • The --x11 flag enables you to launch the GUI version, should you want to. If you are only running the command-line version, you do not need this.
  • You must load the appropriate module: module load stata/16
  • You can now launch Stata:
    • For the command-line version: stata-mp
    • For the GUI version (assuming you have a working X server installed on your local system -and- you connected via ssh -Y): xstata-mp
  • Remember, as noted above, your analyses will be much faster if you work within the /scratch directory! If you need assistance with developing a workflow, reach out to simmsj@lafayette.edu.
  • It is also possible to run Stata in batch mode by processing files non-interactively. More information can be found on Stata’s FAQ, but essentially it is a matter of: stata -b do filename &

VMD

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.

To launch the command-line (non-graphical) version:

module load vmd/1.9.3
vmd -dispdev text

To launch the graphical version, assuming you have a running X server on your local system and are connected to the cluster with ssh -Y:

module load vmd/1.9.3
vmd

But note that currently no GPUs are available on the cluster, so the usefulness of the graphical version, particularly for complex renderings, may be limited. Also, you may see warnings or other error messages appear in your terminal window as you are using the GUI – this is expected, but should not affect the use of the program.

Tagged in: