Quick Start

A collection of information intended for users new to either HPC computing in general or to the FRCE cluster itself.

What is an HPC cluster? (including terminology)

A computer cluster is a set of loosely connected computers (referred to as nodes) that work together so that, in many respects, they can be viewed as a single system. The cluster will have one or more head or login nodes which users can connect directly to and a number of compute nodes, all performing the same or similar tasks. Compute nodes are generally a high-performance computer with a significant amount of memory, access to large amounts of high-speed storage, and connected to one or more networks. Users generally do not have direct access to the compute nodes but will use a job scheduler to submit tasks to the system.

The nodes in a cluster are generally not identical. Many are typical servers with only CPU resources, but systems with attached GPUs are available. In addition, some nodes may be more specialized, such as having much more memory or with processors designed for particular workflows. The overall power of a cluster is almost directly proportional to the number of nodes in the the cluster, allowing for extremely large amounts of computing resources. A list of the top 500 supercomputers in the world is released every six months; all computers on this list are HPC clusters running Linux. Currently (October 2024), the NIH Biowulf cluster is ranked #478, although this will change as new nodes have just been installed.

Diagrams showing the architecture of the FRCE cluster and the NIH Biowulf are available. Each of these give a high-level picture of the cluster components.

A lecture produced by the ABCS provides a quick-start guide to using FRCE. Several other lectures dig into specific use cases to run on FRCE.

Term	Definition
job	A single workflow scheduled on the cluster. A job requests a specific amount of resources (CPU cores, nodes, memory, etc.) and will started by Slurm after the requested resources are available.
batch job	A script usually submitted to the cluster using the `sbatch` command. The script is normally a shell script, although other interpreters may be used. Needed resources are requested either on the `sbatch` command line or by directives included in the script. When the job is submitted, the command will place the job in the queue and return instantly.
interactive job	A job submitted to the cluster using the `srun` command that requests a pseudo-terminal, allowing the requester to interact with the cluster without having to write a script. When the `--pty` argument is provided, the requester will be presented with a command line prompt on the execute node.
slurm controller	All job allocations, scheduling, and accounting are handled by the slurm controller, a process running on a cluster-aware server. For FRCE, a redundent pair of VM's are hosting the contoller daemons and an database process is running on the FRCE head node to record accounting information.
compute node	A compute node (or execute node) is an individual server with a specific set of hardware resources. When a job is started, one or more compute nodes are allocated for the job to run on. It is not necessary for all compute nodes in the cluster to be identical and the slurm controller will allocate nodes that best fit the resource requirements.
login node	A server (physical or virtual) that users can ssh into to submit work to the cluster, check on the status of submitted jobs, and examine the job's output. Resource intensive processes should not be run on the login nodes but submitted to the cluster. batch.ncifcrf.gov, fsitgl-xfer03p.ncifcrf.gov, and nx.ncifcrf.gov are the major login nodes for most users. Some application servers are also configured to be able to submit jobs, usually through a web interface.
partition	A partition is a logical set of compute nodes. Partitions may be characterized by physical resources in common across the nodes, such as amount of memory or the presence of GPUs. Alternatively, partitions may impose limitations on jobs, such as the maximum amount of time a job can request or a limit on the number a jobs a single user may have running simultaneously.
queue	The list of jobs that are either currently running or waiting to be allocated. The queue lists all jobs regardless of the partition requested by the job.

Logging into FRCE

Access to FRCE is through ssh while on the NIH network, either on-site or while connected through VPN. Several ssh clients are available for Windows systems. Macs and Linux systems have native ssh clients.

Access to FRCE is through the head node batch.ncifcrf.gov or alternatively to batch2.ncifcrf.gov. ssh to the server and, at the login prompt, enter your NIH user name. This should be lower-case and without NIH/. The password is your NIH password.

All ssh clients allow for setting up a default username and PKI key pairs that will then allow automatic login without having to enter the username and password. This simplifies the login process and is still secure. Documentation on how to do this has been compiled for several popular ssh clients.

The home directory and other storage

The storage page provides more detail, but essentially there are two main storage areas for data. First is the account's home directory, which has a 48GB quota and is accessible only to the account holder. The second share is on a scratch disk as /scratch/cluster_scratch/username with a 5TB quota. Permissions on this directory are initially set to allow access only to the user account but these permissions can be loosened.

Running jobs

Jobs should never be run directly on the head node but scheduled using the SLURM resource manager. Processes on the head node are subject to a 10 CPU-minute limit and will be killed if this limit is reached.

Interactive access to compute nodes is available when the tasks are best run from a shell prompt.

# start a session on a compute node
$ srun --pty -p norm --ntasks=1 bash
$ hostname
fsitgl-hpc058p.ncifcrf.gov
$ exit
# same but with X11 forwarding enabled
$ srun --pty -p norm --x11 --ntasks=1 bash
$ hostname
$ echo $DISPLAY
$ exit

For longer running or repetitive tasks, a shell script can be written and submitted to a partition. For example, the script hello.sh might read

#!/bin/sh
#SBATCH --job-name=HelloWorld
#SBATCH --output=hello.log
#SBATCH --time=1:00
#SBATCH --partition=short
#SBATCH --ntasks=1
#SBATCH --mem=10G
#SBATCH --mail-type=END
#SBATCH --mail-user=username@mail.nih.gov
echo Hello, World!

The script is then submitted using sbatch hello.sh and JOBID number will be displayed. The status of the job can be checked using squeue --me or squeue -u $USER. After the job completes, resource usage can be queried with seff and sacct. These numbers can be used to used for tuning resource requests on future jobs.

The process for requesting a node with a GPU is very similar. For interactive access, the command to request access to a single Nvidia P100 GPU i2
srun --pty -p gpu --gres=gpu:p100:1 --x11 --ntasks=1 bash
For batch access, replace the line
--partition=short
with
--partition=gpu
--partition=gres=gpu:p100:1

The FRCE cluster has three different types of Nvidia GPU's, p100, v100, and a100. Any of these can be requested in the gres option. The final digit in the option is the number of GPU's being requested.

Be aware that many programs cannout utilize more that one GPU and that the number of GPU's per node is limited - 3 per node for P100's, 8 per node for V100's, and 2 per node for A100's.

Partition (or queue) information is usually displayed using the sinfo command.

$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
short         up      30:00      7    mix cn[002,004,113,123,129,131-132]
short         up      30:00      1  alloc cn133
short         up      30:00     32   idle cn[005,035-038,043-046,110-112,114-121,136-139,144-147,152-155]
norm*         up 5-00:00:00      1  drng@ cn008
...

Another useful command is freen, a program written for the Biowulf cluster. This command provides detailed information on all nodes that are available for each partition.

A common issue to be aware of is that the availability of extra amounts of computing resources may not always help in speeding up the run time of a particular program. Many programs are inherently single-threaded and others have not been coded to take advantage of multiple cores. Requesting more than one core for the program will be fruitless. For programs than can take advantage of multiple core, it is not always the best practice to allocate a large number of parallel threads. A law of dimensioning returns, Andahl's law), is that doubling the number of cores assigned to a task may not double the performance. In practical terms, check with the program documentation to see what is the recommended number of cores. Asking for more than this is a waste of resources and may actually increase the execution time of the program.

Please do not request a large number of cores without considering if these cores can be used effectively.

Software

The FRCE cluster is running Oracle Enterprise Linux 8, equivalent to Red Hat Enterprise 8 or Rocky 8. This distribution provides an extensive collection of utilities, including shells, editors, compilers, scripting languages, and other utilities. In general, any utility that is expected on a Linux system can be accessed though the command name, e.g. gcc -o hello hello.c. Layered over this is a network share where many other scientific and workflow packages are installed. These are loaded into the user's environment through the module command.

$ module display samtools
-------------------------------------------------------------------
/mnt/nasapps/modules-5.3.1/modulefiles/samtools/1.16.1:
module-whatis   {Tools for alignments in the SAM format}
prepend-path    PATH /mnt/nasapps/production/samtools/1.16.1/bin
prepend-path    MANPATH /mnt/nasapps/production/samtools/1.16.1/share/man
-------------------------------------------------------------------
$ module load samtools
[+] Loading samtools 1.16.1
$ samtools --version
samtools 1.16.1
Copyright (C) 2022 Genome Research Ltd.

A list of installed packages can be displayed using module avail. More information on how to use modules if available here.

Where to go from here

A set of usage guides is available that cover topics both specific to FRCE and to general batch clusters. In particular, lectures sponsored by the ABCS are very useful and links to many external videos are given.