Running Jobs on the Cluster
The login system does not have the best performance, so you will need to use the batch scheduler to access the high-performance resources. You can request an interactive session or submit batch jobs.
Starting Interactive Sessions
An interactive session logs you into a compute system. Your commands run in realtime on a high-performance compute node. By default, these sessions have a time-limit of 4 hours - after 4 hours, you will be logged out automatically. Interactive sessions may be shared between users, but they are good for development/testing purposes. Some samples to get started:
Login to a CPU-only node and run on four of the latest Intel Xeon "Cascade Lake" CPU cores
Login to a CPU-only node and run on four of the latest AMD EPYC "Rome" CPU cores
Start a run that also includes an NVIDIA Tesla P100 GPU
Run with -h
to see the many other interactive session requests that are available
-h
to see the many other interactive session requests that are availableView the list of various Compute Node types/capabilities
Inside your Interactive Session
Once your interactive session has started, you will see a printout similar to that shown below. Any commands you run will be executed on the compute node. When you exit the shell, your interactive session will end (and remember that sessions default to 4 hours, so be sure to watch the clock).
Savvy users will note that the interactive session is actually a GNU Screen session. This means that additional shells can be started within the interactive session. A session of htop automatically loads in the first screen window to show current system load. If you're just getting started with command-line access, just go ahead and use the shell screen that appears - there's no need to try starting up multiple shells.
Batch Scripts
Batch jobs provide dedicated resources that are not shared between users. These are good for achieving the best performance numbers, but are not interactive. You must submit a job script to be executed by the batch scheduler.
Batch scripts are files containing a list of commands which the cluster should execute. These are typically written with a shell scripting language, such as Bash.
You will see example batch scripts in your home directory. Submit these as-is to see results for common test cases. Change them as needed to use your own applications and input data. Then save the file and submit to the scheduler.
Before running your own custom application, try submitting a sample batch script
You may then copy and/or change the batch script to suit the needs of your custom application.
Review the status of your running jobs
Cancel a job (whether queued or already running)
You may cancel/kill any of your jobs if you find that they are not running properly or the queue is too full. First, run squeue
to determine the ID number of your job. Then, cancel the job using its ID number.
Review the history of your jobs
Last updated