Launching jobs
CLEPS uses Slurm, a job scheduling system for Linux clusters. You must use it to submit and control jobs. It acts a workload manager by allocating required resources at the right time to run your jobs. This page will provide you with information on job submission in both interactive and batch modes.
Commands used to manage and interact with your jobs are located in the Job management section.
Slurm terminology
Depending on node configuration, a cpu can be either a core or a thread. On nodes where hyperthreading is activated (see CLEPS Compute nodes), a cpu is a thread. Otherwise it is a core.
On CLEPS, the smallest allocatable consumable resource is a core, meaning that if you ask Slurm for a single cpu on a node with hyperthreading activated, you will get a whole core allocated but your task will run on a single thread on this core. This configuration ensures that no other user will use the other thread, limiting interferences with other users jobs.
Examples:
Allocation of a single cpu on a node with hyperthreading activated (node022):
user@cleps:~ srun -w node022 [-c 1] ..
user@cleps:~ scontrol show job <jobid>
...
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
...
Only one cpu (thread) is use per task while two are allocated (the whole core).
Allocation of a single cpu on a node without hyperthreading activated (node055):
user@cleps:~ srun -w node055 [-c 1] ..
user@cleps:~ scontrol show job <jobid>
...
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
...
One cpu (core) is allocated and used.
General use scripts
Below are some sample scripts that you can use as templates for building your own Slurm submission scripts on CLEPS. Make sure you understand the meaning of each line of sbatch directive to avoid wasting valuable computing resources and getting unexpected results. If you need more information, please refer to the sbatch documentation.
Note
Refer to Getting started section for retrieving the cleps_examples folder.
Single-threaded jobs
In this simple example, we submit a sequential job. It would not
benefit of multiple cpus, so the --cpus-per-task
option is set to 1. Note
that this option could be omitted since 1 is the default value, as well as for
the --ntasks
option.
#!/bin/bash
#
# single_thread.batch
#
#SBATCH --job-name=single_thread # Job name
#SBATCH --ntasks=1 # A single task (process)
#SBATCH --cpus-per-task=1 # Using a single CPU
#SBATCH --mem=100m # Job memory request; if unit is not specified MB will be assumed
#SBATCH --time=00:01:00 # Time limit hrs:min:sec
#SBATCH --output=%x_%j.log # Standard output and error log. %x denotes the job name, %j the jobid.
# If neeeded, load your environment (conda, ...)
#
# source /home/$USER/.bashrc
# conda activate my_env
python single_thread.py
sbatch single_thread.batch
Single job with changing arguments
In some cases, e.g. when trying to perform fine tuning of parameters for an algorithm, the same job needs to be executed multiple times with different set of arguments. Slurm provides a mode to launch this type of work automatically, with minimal effort. This mechanism is called Array of jobs.
Job arrays allow to submit a set of similar jobs at once. They are only
supported for batch jobs. To submit such jobs, use the --array
or -a
option and give it a range or a comma separated list of numbers.
sbatch --array=min-max array.batch
sbatch --array=1,5,7,12 array.batch
To identify jobs or parameters of your jobs, slurm provides the
SLURM_ARRAY_TASK_ID
variable to each job. For example if you have a
different input file for each job, with a naming pattern f_1.in, f_2.in, …
f_n.in, you can write the following script:
#!/bin/bash
#
# array.sbatch
#
# Allocated resources are NOT SHARED across the jobs.
# They represent resources allocated for each job
# in the array.
#SBATCH --job-name=array_ex
#SBATCH --output=%x_%A_%a.out
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
# If needed, load your environment (conda, ...)
#
# source /home/$USER/.bashrc
# conda activate my_env
python print_args.py f_${SLURM_ARRAY_TASK_ID}.in
In this example, 1 task with one CPU is allocated for each job.
For the output files, you’ll notice the %A which is replaced by
the value of SLURM_ARRAY_JOB_ID
and %a which is replaced by the
value of SLURM_ARRAY_TASK_ID
. Get more information on the Slurm
job array documentation page.
Multi-threaded jobs
OpenMP job
#!/bin/bash
#SBATCH --job-name=threads_omp # Job name
#SBATCH --nodes=1 # Run all processes on a single node
#SBATCH --ntasks=1 # Run a single task
#SBATCH --cpus-per-task=4 # Number of CPU cores per task
#SBATCH --mem=100mb # Total memory limit; if unit is not specified MB will be assumed
#SBATCH --time=00:01:00 # Time limit hrs:min:sec
#SBATCH --output=%x_%j.log # Standard output and error log. %x denotes the job name, %j the jobid.
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module load gnu13
./threads_omp
To test this example:
# OpenMP multithread
cd multi-threaded/openmp
mkdir build && cd build
# Compilation
module load cmake gnu13
CC=gcc cmake ..
make
# Execution
sbatch ../threads_omp.batch
MPI job
#!/bin/bash
#SBATCH --job-name=hello_world_mpi # Set job name
#SBATCH --partition=cpu_homogen # Proper partition for MPI jobs
#SBATCH --time=00:02:00 # How long you think your job will
#SBATCH --nodes=2 # Number of nodes to allocate
#SBATCH --ntasks-per-node=2 # Number of MPI processes per node
#SBATCH --mem=500m # Memory required per node
#SBATCH --output=%x_%j.log # Standard output and error log. %x denotes the job name, %j the jobid.
module purge
module load gnu13 openmpi5
srun ./build/hello_world_mpi
This example is used in Getting started section.
MPI/Multi-threaded jobs
#!/bin/bash
#SBATCH --job-name=threads_mpi # Job name
#SBATCH --partition=cpu_homogen # Proper partition for MPI jobs
#SBATCH --nodes=4 # Number of nodes to allocate for your job
#SBATCH --tasks-per-node=1 # Number of tasks per node
#SBATCH --cpus-per-task=4 # Number of threads per task
# (hyperthreading is active on cpu_homogen partition)
#SBATCH --mem=500mb # Memory required by node
# If unit is not specified MB will be assumed
#SBATCH --time=00:01:00 # Time limit hrs:min:sec
#SBATCH --output=%x_%j.log # Standard output and error log
# Clear the environment from any previously loaded modules
module purge
# Load the module environment suitable for the job
module load gnu13 openmpi5
# Run the job. srun knows from options how many processes to spawn.
srun ./build/threads_mpi
To test this example:
# cd threads_mpi
module load gnu8 mpich cmake
CC=gcc cmake .
make
sbatch threads_mpi.sbatch
GPU specific scripts
Note
Refer to Getting started section for retrieving the cleps_examples folder. The source code for these examples is in gpu sub-folder.
Note
When you use GPU cards in your jobs, you should use the --cpus-per-gpu
option instead of --cpus-per-tasks
. Slurm will ensure that the allocated
cpus will be close (on the same NUMA node) to the allocated GPU cards.
Our advice is to refer to the cpus/gpu column of the CLEPS Compute nodes table,
and to give this value to --cpus-per-gpu
.
For example, if you want to use a node with a rtx8000 GPU card (gpu00[6-9]),
the cpus/gpu colums reads 16, then use --cpus-per-gpu=16
.
Tensorflow
In order to use TensorFlow you need to install the library. The preferred method is to use a local Conda environment. Please refer to section Tensorflow environment to get more information.
Single GPU
Once you have installed the library, to submit a job that uses a single GPU,
you must use the gpu partition and set the gres
slurm option
with the number of GPUs you want to use and (optionally) the type of GPUs that you need.
These features can be set in the submission script. Here is an example of the
slurm script used to launch a single gpu job:
#!/bin/bash
#
# tf_example_singleGPU.batch
#
#SBATCH --job-name=tf_example # create a short name for your job
#SBATCH --cpus-per-gpu=16 # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu # Name of the partition
#SBATCH --gres=gpu:rtx6000:1 # GPU nodes are only available in gpu partition
#SBATCH --mem=20G # Total memory allocated
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out # output file name
echo "### Running $SLURM_JOB_NAME ###"
cd ${SLURM_SUBMIT_DIR}
module purge
# Set your conda environment
source /home/$USER/.bashrc
# tensorflow environment should be created previously
source activate tf_env
python ./tf_example.py
To run it go to the folder containing the script and enter:
# cd gpu/tensorflow
sbatch tf_example_singleGPU.batch
This job will train a small CNN, using the MNIST dataset.
Multiple GPUs in one node
The job launched in previous section can be also launched in several GPU at the same time. This can be useful, for example when tuning hyperparameters. This feature use the mechanism named “Array of jobs”, previously described in Single job with changing arguments).
Particularly, in this example, a training job is launched in order to train a
CNN using the same dataset from previous example. In this case, two
BATCH_SIZE
are tested in parallel:
#!/bin/bash
#SBATCH --job-name=tf_example # create a short name for your job
#SBATCH --cpus-per-gpu=16 # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu # Name of the partition
#SBATCH --gres=gpu:rtx6000:1 # GPU nodes are only available in gpu partition
#SBATCH --mem=20G # Total memory allocated
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out # output file name
#SBATCH --array=0-1
echo "### Running $SLURM_JOB_NAME ###"
cd ${SLURM_SUBMIT_DIR}
# running with different parameters, in this case, diffferent batch sizes for
# the training
opt[0]="32"
opt[1]="64"
module purge
# Set your conda environment
source /home/$USER/.bashrc
# tensorflow environment shloud bre created previously
source activate tf_env
python ./tf_example.py --batch_size ${opt[$SLURM_ARRAY_TASK_ID]}
To schedule the job, enter the following command:
# cd gpu/tensorflow
sbatch tf_example_multipleGPU.batch
PyTorch
Using PyTorch requires to install the library. Go to section to section PyTorch environment to get details on how to do it. Files used in these examples are available here.
Single GPU
This job (pt_example.py
) will train a CNN, using the MNIST dataset. To
launch the job in a single GPU, the following script is used:
#!/bin/bash
#SBATCH --job-name=pt_example # create a short name for your job
#SBATCH --cpus-per-gpu=16 # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu # Name of the partition
#SBATCH --gres=gpu:rtx6000:1 # Number and type of GPU cards and type allocated
#SBATCH --mem=20G # Total memory allocated
#SBATCH --time=00:05:00 # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out # output file name
echo "### Running $SLURM_JOB_NAME ###"
set -x
cd ${SLURM_SUBMIT_DIR}
module purge
# Set your conda environment
source /home/$USER/.bashrc
# pytorch environment should be created previously
source activate pt_env
python ./pt_example.py
To reserve a node and launch the job type:
sbatch pt_example_singleGPU.batch
Multiple GPUs in one node
In this case, we will launch the same job (pt_example.py
), but this time we
will implement some kind of basic parallelism, by using again the “Array of
jobs” mechanism. Slurm script will schedule 3 jobs with different Gamma
step
for the training of 3 different CNN models.
#!/bin/bash
#SBATCH --job-name=pt_example # create a short name for your job
#SBATCH --cpus-per-gpu=16 # Number of cpus per GPU card (>1 if multi-threaded tasks)
#SBATCH --partition=gpu # Name of the partition
#SBATCH --gres=gpu:rtx6000:1 # Number and type of GPU cards and type allocated
#SBATCH --mem=20G # Total memory allocated
#SBATCH --time=00:05:00 # total run time limit (HH:MM:SS)
#SBATCH --output=%x_%j.out # output file name
#SBATCH --array=0-2
echo "### Running $SLURM_JOB_NAME with array task $SLURM_ARRAY_TASK_ID ###"
set -x
cd ${SLURM_SUBMIT_DIR}
module purge
# Set your conda environment
source /home/$USER/.bashrc
# tensorflow environment shloud bre created previously
source activate pt_env
GAMMA_STEP=('0.1' '0.5' '1.0')
python ./pt_example.py --gamma ${GAMMA_STEP[$SLURM_ARRAY_TASK_ID]}
To launch the job type:
sbatch pt_example_multiGPU.batch
Other useful options
Here are other useful options you can add in your batch script:
Option |
Role |
---|---|
–mail-type |
Mail events (NONE, BEGIN, END, FAIL, ALL) |
–mail-user |
Mail address (<email>@inria.fr) |
–error |
Stderr redirection in a different file (ex: %x_%j.err) |
As you can see in the –error option above, you can use patterns to name files. %x refers to the jobe name, %j to the jobid. This naming scheme can ensure a unique output file name for every run. You can find the exhaustive list of patterns in the sbatch documentation
Interactive sessions
CLEPS allows launching interactive sessions on compute nodes through SLURM. These sessions are recommended for debugging and fast prototyping. Default time allocation is one hour.
The basic command to get an interactive session is:
salloc [-c nb_cpus ...]
Note that no command is given to salloc, the shell is automatically started, on an available node. Default allocation is one core (two cpus on nodes with hyperthreading activated, one cpu when it is not) and a small amount of memory (depending on the partition).
Example: Interactive session on a GPU node, with two rtx6000 and 8 cpus per card allocated:
salloc --gres=gpu:rtx6000:2 -p=gpu --cpus-per-gpu=8
If you need X11 redirection when using computing resource interactively, connect
to CLEPS with the ssh -X
option and add the --x11
option to srun:
ssh -X <login>@cleps
salloc --x11
Using Jupyter notebooks
Jupyter notebooks can be launched from the CLEPS cluster. This tool can be used for rapid prototyping or quick debugging. However for intensive processing works, launching the jobs using a batch script remains the preferred method. Generally speaking, the recommended method to start a Jupyter notebook consists in launching an interactive session and start the Jupyter server. Then, redirect the output trough a tunnel SSH to your local machine. A browser with the right web address and the corresponding token will allow to to work from your local computer.
Step by step:
Create a Conda environment and install JupyterLab (or Jupyter notebook) inside.
conda create --name jupyter_env python=3.10
conda activate jupyter_env
pip install jupyterlab
Connect interactively to a node, in this example we will target a gpu node:
salloc -c 8 --gres=gpu:v100:1 --hint=multithread -p gpu
Activate your Conda environment and launch the notebook:
conda activate jupyter_env
jupyter-notebook --no-browser --port 8001 --ip $(hostname)
From your local machine, create a tunnel to the cluster’s login addressing the node where your interactive session was launched. For example, if
$hostname
isgpu001
your ssh tunnel will be:
ssh -N -L 8001:gpu001:8001 cleps.paris.inria.fr
Open your browser at http://localhost:8001 and enter the token that you got when the notebook was launched. Your notebook will be ready to start working!
Visualization
Note
August 2024: The ParaView server is not available yet, but will be installed soon.
ParaView
is available via the module command. It has been compiled with gnu8 and mpich, therefore the command to load it is the following:
module load gnu8 mpich paraview
It is aimed at beeing used in client/server mode to perform rendering on the server side. There are two main constraints to use ParaView
on CLEPS:
Use the exact same version (5.11.1) on the client and on the server (available here)
Launch the server on a gpu node
To use ParaView
, start on interactive job on a gpu node, load the required modules and start the server:
salloc [-c <nb_cpus>] -p gpu --gpus-per-node=1
module load gnu8 mpich paraview
pvserver --force-offscreen-rendering
By default the server listens on port 11111, but you can change it with the -p <port number> option. Then, locally, create a ssh tunnel between your machine and the allocated node (ex: gpu007):
ssh -N cleps -L 8888:gpu007:11111
where 8888 is the local port used to create the tunnel. You can use whatever port (>1023) you wish, as long there is no firewall blocking it. Finally, start the paraview client on your machine and connect to the server:
Click on the Connect icon
Add Server
Leave localhost but edit the port to match the one you choose locally for the ssh tunnel
If everything goes well you should see a message on the server side confirming the connection with the client.
ParaView
can be used on several nodes if your data cannot fit in the RAM of one node. To use it, allocate more than one node with one task per node and start the server with srun:
salloc [-c <nb_cpus>] -N <nb_nodes> -n 1 -p gpu --gpus-per-node=1
srun ./pvserver --force-offscreen-rendering
Here is a useful link to optimize your workflow with ParaView
.