Launching jobs

CLEPS uses Slurm, a job scheduling system for Linux clusters. You must use it to submit and control jobs. It acts a workload manager by allocating required resources at the right time to run your jobs. This page will provide you with information on job submission in both interactive and batch modes.

Commands used to manage and interact with your jobs are located in the Job management section.

Slurm terminology

Depending on node configuration, a cpu can be either a core or a thread. On nodes where hyperthreading is activated (see CLEPS Compute nodes), a cpu is a thread. Otherwise it is a core.

On CLEPS, the smallest allocatable consumable resource is a core, meaning that if you ask Slurm for a single cpu on a node with hyperthreading activated, you will get a whole core allocated but your task will run on a single thread on this core. This configuration ensures that no other user will use the other thread, limiting interferences with other users jobs.

Examples:

  • Allocation of a single cpu on a node with hyperthreading activated (node022):

user@cleps:~ srun -w node022 [-c 1] ..
user@cleps:~ scontrol show job <jobid>
...
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
...

Only one cpu (thread) is use per task while two are allocated (the whole core).

  • Allocation of a single cpu on a node without hyperthreading activated (node055):

user@cleps:~ srun -w node055 [-c 1] ..
user@cleps:~ scontrol show job <jobid>
...
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
...

One cpu (core) is allocated and used.

General use scripts

Below are some sample scripts that you can use as templates for building your own Slurm submission scripts on CLEPS. Make sure you understand the meaning of each line of sbatch directive to avoid wasting valuable computing resources and getting unexpected results. If you need more information, please refer to the sbatch documentation.

Note

Refer to Getting started section for retrieving the cleps_examples folder.

Single-threaded jobs

In this simple example, we submit a sequential job. It would not benefit of multiple cpus, so the --cpus-per-task option is set to 1. Note that this option could be omitted since 1 is the default value, as well as for the --ntasks option.

#!/bin/bash
#
# single_thread.batch
#

#SBATCH --job-name=single_thread      # Job name
#SBATCH --ntasks=1                    # A single task (process)
#SBATCH --cpus-per-task=1             # Using a single CPU
#SBATCH --mem=100m                    # Job memory request; if unit is not specified MB will be assumed
#SBATCH --time=00:01:00               # Time limit hrs:min:sec
#SBATCH --output=%x_%j.log            # Standard output and error log. %x denotes the job name, %j the jobid.

# If neeeded, load your environment (conda, ...)
# 
# source /home/$USER/.bashrc
# conda activate my_env

python single_thread.py
sbatch single_thread.batch

Single job with changing arguments

In some cases, e.g. when trying to perform fine tuning of parameters for an algorithm, the same job needs to be executed multiple times with different set of arguments. Slurm provides a mode to launch this type of work automatically, with minimal effort. This mechanism is called Array of jobs.

Job arrays allow to submit a set of similar jobs at once. They are only supported for batch jobs. To submit such jobs, use the --array or -a option and give it a range or a comma separated list of numbers.

sbatch --array=min-max array.batch
sbatch --array=1,5,7,12 array.batch

To identify jobs or parameters of your jobs, slurm provides the SLURM_ARRAY_TASK_ID variable to each job. For example if you have a different input file for each job, with a naming pattern f_1.in, f_2.in, … f_n.in, you can write the following script:

#!/bin/bash
#
# array.sbatch
#
# Allocated resources are NOT SHARED across the jobs.
# They represent resources allocated for each job
# in the array.

#SBATCH --job-name=array_ex
#SBATCH --output=%x_%A_%a.out
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2

# If needed, load your environment (conda, ...)
#
# source /home/$USER/.bashrc
# conda activate my_env

python print_args.py f_${SLURM_ARRAY_TASK_ID}.in

In this example, 1 task with one CPU is allocated for each job. For the output files, you’ll notice the %A which is replaced by the value of SLURM_ARRAY_JOB_ID and %a which is replaced by the value of SLURM_ARRAY_TASK_ID. Get more information on the Slurm job array documentation page.

Multi-threaded jobs

  • OpenMP job

#!/bin/bash
#SBATCH --job-name=threads_omp       # Job name
#SBATCH --nodes=1                    # Run all processes on a single node
#SBATCH --ntasks=1                   # Run a single task	
#SBATCH --cpus-per-task=4            # Number of CPU cores per task
#SBATCH --mem=100mb                  # Total memory limit; if unit is not specified MB will be assumed
#SBATCH --time=00:01:00              # Time limit hrs:min:sec
#SBATCH --output=%x_%j.log            # Standard output and error log. %x denotes the job name, %j the jobid.

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

module load gnu13

./threads_omp

To test this example:

# OpenMP multithread

cd multi-threaded/openmp
mkdir build && cd build

# Compilation
module load cmake gnu13
CC=gcc cmake ..
make

# Execution
sbatch ../threads_omp.batch

MPI job

#!/bin/bash
#SBATCH --job-name=hello_world_mpi        # Set job name
#SBATCH --partition=cpu_homogen           # Proper partition for MPI jobs  
#SBATCH --time=00:02:00                   # How long you think your job will
#SBATCH --nodes=2                         # Number of nodes to allocate
#SBATCH --ntasks-per-node=2               # Number of MPI processes per node
#SBATCH --mem=500m                        # Memory required per node
#SBATCH --output=%x_%j.log                # Standard output and error log. %x denotes the job name, %j the jobid.

module purge
module load gnu13 openmpi5

srun ./build/hello_world_mpi

This example is used in Getting started section.

MPI/Multi-threaded jobs

#!/bin/bash
#SBATCH --job-name=threads_mpi    # Job name
#SBATCH --partition=cpu_homogen   # Proper partition for MPI jobs  
#SBATCH --nodes=4                 # Number of nodes to allocate for your job
#SBATCH --tasks-per-node=1        # Number of tasks per node
#SBATCH --cpus-per-task=4         # Number of threads per task
                                  # (hyperthreading is active on cpu_homogen partition)
#SBATCH --mem=500mb               # Memory required by node
                                  # If unit is not specified MB will be assumed
#SBATCH --time=00:01:00           # Time limit hrs:min:sec
#SBATCH --output=%x_%j.log        # Standard output and error log

# Clear the environment from any previously loaded modules
module purge

# Load the module environment suitable for the job
module load gnu13 openmpi5

# Run the job. srun knows from options how many processes to spawn.
srun ./build/threads_mpi

To test this example:

# cd threads_mpi

module load gnu8 mpich cmake
CC=gcc cmake .
make

sbatch threads_mpi.sbatch

GPU specific scripts

Note

Refer to Getting started section for retrieving the cleps_examples folder. The source code for these examples is in gpu sub-folder.

Note

When you use GPU cards in your jobs, you should use the --cpus-per-gpu option instead of --cpus-per-tasks. Slurm will ensure that the allocated cpus will be close (on the same NUMA node) to the allocated GPU cards. Our advice is to refer to the cpus/gpu column of the CLEPS Compute nodes table, and to give this value to --cpus-per-gpu.

For example, if you want to use a node with a rtx8000 GPU card (gpu00[6-9]), the cpus/gpu colums reads 16, then use --cpus-per-gpu=16.

Tensorflow

In order to use TensorFlow you need to install the library. The preferred method is to use a local Conda environment. Please refer to section Tensorflow environment to get more information.

Single GPU

Once you have installed the library, to submit a job that uses a single GPU, you must use the gpu partition and set the gres slurm option with the number of GPUs you want to use and (optionally) the type of GPUs that you need. These features can be set in the submission script. Here is an example of the slurm script used to launch a single gpu job:

#!/bin/bash
#
# tf_example_singleGPU.batch
#

#SBATCH --job-name=tf_example    # create a short name for your job
#SBATCH --cpus-per-gpu=16        # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu          # Name of the partition
#SBATCH --gres=gpu:rtx6000:1     # GPU nodes are only available in gpu partition
#SBATCH --mem=20G                # Total memory allocated
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out   # output file name

echo "### Running $SLURM_JOB_NAME ###"

cd ${SLURM_SUBMIT_DIR}

module purge

# Set your conda environment
source /home/$USER/.bashrc
# tensorflow environment should be created previously
source activate tf_env

python ./tf_example.py

To run it go to the folder containing the script and enter:

# cd gpu/tensorflow
sbatch tf_example_singleGPU.batch

This job will train a small CNN, using the MNIST dataset.

Multiple GPUs in one node

The job launched in previous section can be also launched in several GPU at the same time. This can be useful, for example when tuning hyperparameters. This feature use the mechanism named “Array of jobs”, previously described in Single job with changing arguments).

Particularly, in this example, a training job is launched in order to train a CNN using the same dataset from previous example. In this case, two BATCH_SIZE are tested in parallel:

#!/bin/bash

#SBATCH --job-name=tf_example    # create a short name for your job
#SBATCH --cpus-per-gpu=16        # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu          # Name of the partition
#SBATCH --gres=gpu:rtx6000:1     # GPU nodes are only available in gpu partition
#SBATCH --mem=20G                # Total memory allocated
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out   # output file name
#SBATCH --array=0-1

echo "### Running $SLURM_JOB_NAME ###"

cd ${SLURM_SUBMIT_DIR}

# running with different parameters, in this case, diffferent batch sizes for
# the training
opt[0]="32"
opt[1]="64"

module purge

# Set your conda environment
source /home/$USER/.bashrc
# tensorflow environment shloud bre created previously
source activate tf_env

python ./tf_example.py --batch_size ${opt[$SLURM_ARRAY_TASK_ID]}

To schedule the job, enter the following command:

# cd gpu/tensorflow
sbatch tf_example_multipleGPU.batch

PyTorch

Using PyTorch requires to install the library. Go to section to section PyTorch environment to get details on how to do it. Files used in these examples are available here.

Single GPU

This job (pt_example.py) will train a CNN, using the MNIST dataset. To launch the job in a single GPU, the following script is used:

#!/bin/bash

#SBATCH --job-name=pt_example    # create a short name for your job
#SBATCH --cpus-per-gpu=16        # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu          # Name of the partition
#SBATCH --gres=gpu:rtx6000:1     # Number and type of GPU cards and type allocated
#SBATCH --mem=20G                # Total memory allocated
#SBATCH --time=00:05:00          # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out       # output file name

echo "### Running $SLURM_JOB_NAME ###"

set -x
cd ${SLURM_SUBMIT_DIR}

module purge

# Set your conda environment
source /home/$USER/.bashrc
# pytorch environment should be created previously
source activate pt_env

python ./pt_example.py

To reserve a node and launch the job type:

sbatch pt_example_singleGPU.batch

Multiple GPUs in one node

In this case, we will launch the same job (pt_example.py), but this time we will implement some kind of basic parallelism, by using again the “Array of jobs” mechanism. Slurm script will schedule 3 jobs with different Gamma step for the training of 3 different CNN models.

#!/bin/bash

#SBATCH --job-name=pt_example    # create a short name for your job
#SBATCH --cpus-per-gpu=16        # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu          # Name of the partition
#SBATCH --gres=gpu:rtx6000:1     # Number and type of GPU cards and type allocated
#SBATCH --mem=20G                # Total memory allocated
#SBATCH --time=00:05:00          # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out       # output file name
#SBATCH --array=0-2

echo "### Running $SLURM_JOB_NAME with array task $SLURM_ARRAY_TASK_ID ###"

set -x
cd ${SLURM_SUBMIT_DIR}

module purge

# Set your conda environment
source /home/$USER/.bashrc
# tensorflow environment shloud bre created previously
source activate pt_env

GAMMA_STEP=('0.1' '0.5' '1.0')
python ./pt_example.py --gamma ${GAMMA_STEP[$SLURM_ARRAY_TASK_ID]}

To launch the job type:

sbatch pt_example_multiGPU.batch

Other useful options

Here are other useful options you can add in your batch script:

Option

Role

–mail-type

Mail events (NONE, BEGIN, END, FAIL, ALL)

–mail-user

Mail address (<email>@inria.fr)

–error

Stderr redirection in a different file (ex: %x_%j.err)

As you can see in the –error option above, you can use patterns to name files. %x refers to the jobe name, %j to the jobid. This naming scheme can ensure a unique output file name for every run. You can find the exhaustive list of patterns in the sbatch documentation

Interactive sessions

CLEPS allows launching interactive sessions on compute nodes through SLURM. These sessions are recommended for debugging and fast prototyping. Default time allocation is one hour.

The basic command to get an interactive session is:

salloc [-c nb_cpus ...]

Note that no command is given to salloc, the shell is automatically started, on an available node. Default allocation is one core (two cpus on nodes with hyperthreading activated, one cpu when it is not) and a small amount of memory (depending on the partition).

Example: Interactive session on a GPU node, with two rtx6000 and 8 cpus per card allocated:

salloc --gres=gpu:rtx6000:2 -p=gpu --cpus-per-gpu=8

If you need X11 redirection when using computing resource interactively, connect to CLEPS with the ssh -X option and add the --x11 option to srun:

ssh -X <login>@cleps
salloc --x11

Using Jupyter notebooks

Jupyter notebooks can be launched from the CLEPS cluster. This tool can be used for rapid prototyping or quick debugging. However for intensive processing works, launching the jobs using a batch script remains the preferred method. Generally speaking, the recommended method to start a Jupyter notebook consists in launching an interactive session and start the Jupyter server. Then, redirect the output trough a tunnel SSH to your local machine. A browser with the right web address and the corresponding token will allow to to work from your local computer.

Step by step:

  • Create a Conda environment and install JupyterLab (or Jupyter notebook) inside.

conda create --name jupyter_env python=3.10
conda activate jupyter_env
pip install jupyterlab
  • Connect interactively to a node, in this example we will target a gpu node:

salloc -c 8 --gres=gpu:v100:1 --hint=multithread -p gpu
  • Activate your Conda environment and launch the notebook:

conda activate jupyter_env
jupyter-notebook --no-browser --port 8001 --ip $(hostname)
  • From your local machine, create a tunnel to the cluster’s login addressing the node where your interactive session was launched. For example, if $hostname is gpu001 your ssh tunnel will be:

ssh -N -L 8001:gpu001:8001 cleps.paris.inria.fr
  • Open your browser at http://localhost:8001 and enter the token that you got when the notebook was launched. Your notebook will be ready to start working!

Visualization

Note

August 2024: The ParaView server is not available yet, but will be installed soon.

ParaView is available via the module command. It has been compiled with gnu8 and mpich, therefore the command to load it is the following:

module load gnu8 mpich paraview

It is aimed at beeing used in client/server mode to perform rendering on the server side. There are two main constraints to use ParaView on CLEPS:

  • Use the exact same version (5.11.1) on the client and on the server (available here)

  • Launch the server on a gpu node

To use ParaView, start on interactive job on a gpu node, load the required modules and start the server:

salloc [-c <nb_cpus>] -p gpu --gpus-per-node=1
module load gnu8 mpich paraview
pvserver --force-offscreen-rendering

By default the server listens on port 11111, but you can change it with the -p <port number> option. Then, locally, create a ssh tunnel between your machine and the allocated node (ex: gpu007):

ssh -N cleps -L 8888:gpu007:11111

where 8888 is the local port used to create the tunnel. You can use whatever port (>1023) you wish, as long there is no firewall blocking it. Finally, start the paraview client on your machine and connect to the server:

  • Click on the Connect icon

  • Add Server

  • Leave localhost but edit the port to match the one you choose locally for the ssh tunnel

If everything goes well you should see a message on the server side confirming the connection with the client.

ParaView can be used on several nodes if your data cannot fit in the RAM of one node. To use it, allocate more than one node with one task per node and start the server with srun:

salloc [-c <nb_cpus>] -N <nb_nodes> -n 1 -p gpu --gpus-per-node=1
srun ./pvserver --force-offscreen-rendering

Here is a useful link to optimize your workflow with ParaView.