Running Jobs on the Cluster

The login system does not have the best performance, so you will need to use the batch scheduler to access the high-performance resources. You can request an interactive session or submit batch jobs.

Starting Interactive Sessions

An interactive session logs you into a compute system. Your commands run in realtime on a high-performance compute node. By default, these sessions have a time-limit of 4 hours - after 4 hours, you will be logged out automatically. Interactive sessions may be shared between users, but they are good for development/testing purposes. Some samples to get started:

Login to a CPU-only node and run on four of the latest Intel Xeon "Cascade Lake" CPU cores

interactive_session -c 4 -f Cascade

Login to a CPU-only node and run on four of the latest AMD EPYC "Rome" CPU cores

interactive_session -c 4 -f Rome

Start a run that also includes an NVIDIA Tesla P100 GPU

interactive_session -c 4 -g gpu:Tesla-P100-PCIE-16GB:1

Run with -h to see the many other interactive session requests that are available

interactive_session -h

View the list of various Compute Node types/capabilities

slurm_node_info

Inside your Interactive Session

Once your interactive session has started, you will see a printout similar to that shown below. Any commands you run will be executed on the compute node. When you exit the shell, your interactive session will end (and remember that sessions default to 4 hours, so be sure to watch the clock).

Savvy users will note that the interactive session is actually a GNU Screen session. This means that additional shells can be started within the interactive session. A session of htop automatically loads in the first screen window to show current system load. If you're just getting started with command-line access, just go ahead and use the shell screen that appears - there's no need to try starting up multiple shells.

Batch Scripts

Batch jobs provide dedicated resources that are not shared between users. These are good for achieving the best performance numbers, but are not interactive. You must submit a job script to be executed by the batch scheduler.

Batch scripts are files containing a list of commands which the cluster should execute. These are typically written with a shell scripting language, such as Bash.

You will see example batch scripts in your home directory. Submit these as-is to see results for common test cases. Change them as needed to use your own applications and input data. Then save the file and submit to the scheduler.

Before running your own custom application, try submitting a sample batch script

cd characterize-hpc-node/
sbatch run.sbatch

You may then copy and/or change the batch script to suit the needs of your custom application.

Review the status of your running jobs

squeue -l

Cancel a job (whether queued or already running)

You may cancel/kill any of your jobs if you find that they are not running properly or the queue is too full. First, run squeue to determine the ID number of your job. Then, cancel the job using its ID number.

scancel <job_id_number>

Review the history of your jobs

slurm history

Last updated