Cheatsheet for CSD3 (Cambridge HPC)

Cheatsheet for CSD3 (Cambridge HPC)

The documentation for the Cambridge HPC is good, but I've tried to summarize the most relevant parts below.

Logging In

You need to be on a Cambridge network or logged into the VPN. Then, from your terminal:

# For CPU cluster
ssh <CSRID>@login-cpu.hpc.cam.ac.uk

# For GPU cluster
ssh <CSRID>@login-gpu.hpc.cam.ac.uk

Basic conda on CSD3

You can install conda packages as follows:

Submitting a job to SLURM

Below I have modified the standard slurm_submit.peta4-skylake SLURM script to run a graph tool job. The things I changed:

  • Line 15: Put the name of the project. This can be found by typing mybalanceinto the command line when logged into CSD3
  • Nodes (line 17) and tasks (line 20) are changed to one for the test
  • Line 58-60 load the conda module and virtual environment
  • Line 63 specifies the application to run (test_graph_tool.py is just a simple script that imports graph-tool to see if it works)
  • I commented out line 93 and and uncommented line 97 to just run the basic application instead of using MPI

More information can be found here.

A note, you need to first deactivate the environment before submitting a job..

Sbatch commands for submitting jobs via slurm

Docker and Singularity

There is a strange bug with the standard temporary directory used in pulling containers. So, to pull a container, make a temporary directory in your home folder. For example:

mkdir .singularity_tmp

Then, you can run the singularity pull command (similar to Docker pull):

TMPDIR=~/.singularity_tmp singularity pull docker://marcosfelt/summit

Replace marcosfelt/summit with the name of the image you want to pull.

To-Do

Set up conda and see if you can properly install graph tool
Figure out how to submit jobs to SLURM using conda
Figure out how to use DMPTC for long running jobs: https://docs.hpc.cam.ac.uk/hpc/user-guide/long.html
Consider testing jupyterlab: https://docs.hpc.cam.ac.uk/hpc/software-packages/jupyter.html