CLEPS architecture¶

What is a cluster?¶

A computing cluster is a group of interconnected computers. The typical architecture is composed of:

a login node
several computing nodes
storage nodes
a network

../_images/ArchitectureSimpleCluster.png

In the case of CLEPS, users connect to the login node and schedule their jobs from there. The scheduler (also named ressource manager) is responsible for finding available computing resources and starting jobs using them.

CLEPS Compute nodes¶

All the nodes are running CentOS 7.9 with linux kernel 3.10.0.

id	number of nodes	RAM	`/local` disk	processor type	hyper threading	total number of cores	average memory per core	Features	InfiniBand Network	GPU/node	CPUs/GPU *
node0[01-20]	20	192 GB 2667 MHz	220 GB 6GB/s SSD	2x Cacade Lake Intel Xeon 5218 16 cores, 2.4GHz	yes	640	6GB	hyperthreading,192go, cascadelake	100Gb/s	None
node0[21-24]	4	176 GB 2400 MHz	600 GB 6GB/s HDD	2x Intel Xeon E5-2650 v4 12 cores	yes	96	7.3GB	hyperthreading,176go, broadwell	56Gb/s
node0[25-28]	4	128 GB 2667 MHz	800 GB 6GB/s HDD	2x Skylake Intel Xeon 5118 12 cores	yes	96	5.3GB	hyperhtreading,128go, skylake	100Gb/s
node0[29-40]	8	192 GB 2400 MHz	800 GB 6GB/s HDD	2x broadwell xeon e5-2650 v4 12 cores	yes	288	8GB	hyperthreading,192go, broadwell	56Gb/s
node0[41-44]	4	256 GB 3200 MHz	370 GB 6GB/s SSD	2x AMD EPIC 7352 24 cores, 2.3GHz	yes	192	5.3GB	hyperthreading,amd,256go	100Gb/s
node0[45-48]	4	128 GB 2133 MHz	800 GB 6GB/s HDD	2x broadwell xeon e5-2695 v3 14 cores	yes	112	4.6GB	hyperthreading,128go, broadwell	56Gb/s
node0[49-56]	8	128 GB 2400 MHz	800 GB 6GB/s HDD	2x broadwell xeon e5-2695 v4 18 cores	no	288	3.6GB	nohyperthreading,128go, broadwell	56Gb/s
mem001	1	3 TB 1333 MHz	200 GB 6GB/s HDD	4x Intel Xeon E7-4860 v2 12 cores, 2.6-3.2GHz	no	48	62.5GB	nohyperthreading,3to	56Gb/s
gpu001	1	192 GB 2667 MHz	3.8 TB 12GB/s SSD	2x Cascade Lake Intel Xeon 5217 8 cores, 3-3.7GHz	no	16	12GB	nohyperhtreading,192to, v100	100Gb/s	2x Nvidia V100 32GB	8
gpu00[2-3]	2	192 GB 3200 MHz	1.5 TB 12GB/s NVME	2x AMD EPIC 7302 16 cores, 3-3.3GHz	yes	64	6GB	hyperthreading,192go, rtx6000	100Gb/s	3x Nvidia RTX6000 24GB	16
gpu00[4-5]	2	96 GB 2400 MHz	200 GB 6GB/s HDD	2x Skylake Intel Xeon 5118 12 cores, 2.3-3.2GHz	yes	48	4GB	hyperthreading,96go, gtx1080ti	56Gb/s	4x Nvidia GTX 1080Ti	12
gpu00[6-9]	4	192 GB 3200 MHz	1.5 TB 12GB/s NVME	2x AMD EPIC 7302 16 cores, 3-3.3GHz	yes	128	6GB	hyperthreading,192go, rtx8000	100Gb/s	3x Nvidia RTX8000 48GB	16
gpu011	1	128 GB 2400 MHz	200 GB 6 GB/s HDD	2x Intel Xeon E5-2650L v4 14 cores, 1.7-2.5GHz	yes	28	4GB	hyperthreading,128go, rtx2080ti	56Gb/s	4x Nvidia RTX2080ti 12GB	6
gpu01[2-3]	2	256 GB 3200 MHz	3.6 GB 12 GB/s HDD	AMD EPYC 7543P 32 cores, 2.8GHz	yes	56	4GB	hyperthreading,256go, a100	100Gb/s	4x Nvidia A100 80GB	14

*: Maximum number of cpu you can allocate with --cpus-per-task per allocated GPU card.

You’ll notice that some nodes have the hyperthreading activated. It means that you can allocate twice as many logical cores (threads) as there are physical cores on these nodes. For example for node001 to node020, you can allocate a maximum of 64 logical cores.

CLEPS Partitions¶

When you submit a job with srun or sbatch, you submit it into a partition (like a queue). Nodes into a partition share common purpose or configuration. To specify a partition when submitting a job, add the -p or --partition option, followed by the name of the partition.

# To submit your job into cpu_homogen partition
srun -N 2 -p cpu_homogen <myjob>

$ cat <my_batch_script>.batch
#!/bin/bash
#SBATCH --partition=cpu_homogen
...

If none is specified, you job will run in the default partition cpu_devel.

partition name	nodes	jobs max duration	purpose/configuration
cpu_devel	node021-056	1 week	Tests, compilations and small jobs
cpu_homogen	node001-020	1 week	Homogeneous set of nodes. Suited for scaling studies of MPI jobs
gpu	gpu001-009, gpu01[1-3]	2 days	Nodes equiped with GPUs
mem	mem001	2 days	Large memory node
*almanach	gpu009	2 days	GPU node, ALMANACH priority
*willow	gpu01[2-3]	2 days	GPU node, WILLOW priority

Warning

Projects have the possibility to buy computing resources and to include them into the CLEPS infrastructure. They benefit from the whole infrastructure and mechanisms such as scheduling. They can also benefit a priorirty access on their resources, while letting them accessible to users from other projects when not used. This Slurm mechanism is known as job preemption.

Such resources are therefore present in two different partitions. The generic one that makes them available to everyone, and a higher priority one, only available to the members of the project that funded the resources. Such higher priority partition are marked with a * in the table above.

The gpu partition is currently the only one concerned. Be aware that submitting in this partition could start your job on a proprietary resource, also included in either the almanach or willow partition.

If you don’t want to take the risk of beeing preempted by a higher priority job, you can explicitely exclude proprietary nodes from your allocation request with the --exclude option.

Example:

srun -p gpu --exclude=gpu009,gpu01[2-3] ...

will exclude nodes gpu009, gpu012, gpu013 from your allocation.

If you belong to a team that benefit a prioritary access to some hardware, you have to specify both your partition AND account. I.e. for the members of ALMANACH team:

srun -p almanach -A almanach [options] <my_script_name>

Node features¶

In the table CLEPS Compute nodes, you’ll notice a column Features. This column ensure the possibility to target nodes with certain caracteristics in a partition.

Example:

You want to target nodes with AMD processors in the cpu_devel partition (default partition):

srun --constraint=amd ...

See the Slurm documentation for more information.

CLEPS Storage¶

The `/home` path¶

Your /home path is the prefered place to compile your code and do small development tasks. It is backuped so it also a good place to store important data. It is accessible with the $HOME environment variable.

Capacity and quotas

This partition is 9TB xfs filesystem and your disk space quota is set at 100GB.

To check your disk usage:

cd
du -sh .

The `/scratch` path¶

The scratch partition is a Lustre parallel filesystem and thereby designed to support large-file parallel IO. It is not backuped so it is not the right place to leave important data. It is accessible with the $SCRATCH environment variable.

Capacity and quotas

There are currently 500To available and a project quota of 20To is applied to each GID(=Team/EPI/service). If you need more space, you can contact directly the support via the helpdesk.

To check your project quota status:

# First, get your project ID
grep <group> /etc/lustre-projectid-gid | cut -c1
lfs quota -ph <project ID> /scratch

Lustre offers many tuning parameters to increase performances even for small files. Check the File striping page to know how to tune your /scratch tree.

Two special folders are available on the /scratch partition:

/scratch/_projets_/<user_main_group>, a folder shared by all the members of a project, accessible with the $PROJECT environment variable.
/scratch/_public_, a read-only folder accessible to every user. It can be used to store large data shared by several projects. An explicit demand must be done for the admin to write the data. It is accessible with the variable $PUBLIC.

The `/local` path¶

As the name suggests, this place is local to each node. It can be accessed only while a job is running with the variable $TMP_DIR. You can see how much space is available on each node in the CLEPS Compute nodes section.

Warning

This folder (/local) is a temporary storage solution, available at the scale of a running job. Once your job is over, all your data are erased.