CLEPS architecture ================== What is a cluster? ------------------ A computing cluster is a group of interconnected computers. The typical architecture is composed of: * a login node * several computing nodes * storage nodes * a network .. image:: ../img/ArchitectureSimpleCluster.png In the case of CLEPS, users connect to the login node and schedule their jobs from there. The `scheduler` (also named ressource manager) is responsible for finding available computing resources and starting jobs using them. .. _comp_nodes: CLEPS Compute nodes ------------------- All the nodes are running CentOS 7.9 with linux kernel 3.10.0. .. cssclass:: table-striped +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ | | | number | | | ``/local`` | | processor | | hyper | | total | | average | | | InfiniBand| | CPUs/GPU [*]_ | | id | | of nodes | RAM | | disk | | type | | threading | | number | | memory | Features | | Network | GPU/node | | | | | | | | | | of cores | | per core | | | | | +================+=============+=============+===============+========================+=============+============+==============+=========================+=============+===============+=================+ | node0[01-20] | 20 | | 192 GB | | 220 GB | | 2x Cacade Lake | | 640 | 6GB | | | | | | | | 2667 MHz | | 6GB/s | | Intel Xeon 5218 | yes | | | hyperthreading,192go, | 100Gb/s | | | | | | | SSD | | 16 cores, 2.4GHz | | | | cascadelake | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+ | | node0[21-24] | 4 | | 176 GB | | 600 GB | | 2x Intel Xeon | | 96 | 7.3GB | | | | | | | | 2400 MHz | | 6GB/s | | E5-2650 v4 | yes | | | hyperthreading,176go, | 56Gb/s | | | | | | | HDD | | 12 cores | | | | broadwell | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+ None | | node0[25-28] | 4 | | 128 GB | | 800 GB | | 2x Skylake | | 96 | 5.3GB | | | | | | | | 2667 MHz | | 6GB/s | | Intel Xeon 5118 | yes | | | hyperhtreading,128go, | 100Gb/s | | | | | | | HDD | | 12 cores | | | | skylake | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+ | | node0[29-40] | 8 | | 192 GB | | 800 GB | | 2x broadwell | | 288 | 8GB | | | | | | | | 2400 MHz | | 6GB/s | | xeon e5-2650 v4 | yes | | | hyperthreading,192go, | 56Gb/s | | | | | | | HDD | | 12 cores | | | | broadwell | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+ | | node0[41-44] | 4 | | 256 GB | | 370 GB | | 2x AMD EPIC 7352 | | 192 | 5.3GB | | | | | | | | 3200 MHz | | 6GB/s | | 24 cores, 2.3GHz | yes | | | hyperthreading,amd,256go| 100Gb/s | | | | | | | SSD | | | | | | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+ | | node0[45-48] | 4 | | 128 GB | | 800 GB | | 2x broadwell | | 112 | 4.6GB | | | | | | | | 2133 MHz | | 6GB/s | | xeon e5-2695 v3 | yes | | | hyperthreading,128go, | 56Gb/s | | | | | | | HDD | | 14 cores | | | | broadwell | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+ | | node0[49-56] | 8 | | 128 GB | | 800 GB | | 2x broadwell | | 288 | 3.6GB | | | | | | | | 2400 MHz | | 6GB/s | | xeon e5-2695 v4 | no | | | nohyperthreading,128go, | 56Gb/s | | | | | | | HDD | | 18 cores | | | | broadwell | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+ | | mem001 | 1 | | 3 TB | | 200 GB | | 4x Intel Xeon | | 48 | 62.5GB | | | | | | | | 1333 MHz | | 6GB/s | | E7-4860 v2 | no | | | nohyperthreading,3to | 56Gb/s | | | | | | | HDD | | 12 cores, 2.6-3.2GHz | | | | | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ | gpu001 | 1 | | 192 GB | | 3.8 TB | | 2x Cascade Lake | | 16 | 12GB | nohyperhtreading,192to, | | | 2x Nvidia | 8 | | | | | 2667 MHz | | 12GB/s | | Intel Xeon 5217 | no | | | v100 | 100Gb/s | | V100 32GB | | | | | | | SSD | | 8 cores, 3-3.7GHz | | | | | | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ | gpu00[2-3] | 2 | | 192 GB | | 1.5 TB | | 2x AMD EPIC 7302 | | 64 | 6GB | hyperthreading,192go, | | | 3x Nvidia | 16 | | | | | 3200 MHz | | 12GB/s | | 16 cores, 3-3.3GHz | yes | | | rtx6000 | 100Gb/s | | RTX6000 24GB| | | | | | | NVME | | | | | | | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ | gpu00[4-5] | 2 | | 96 GB | | 200 GB | | 2x Skylake | | 48 | 4GB | hyperthreading,96go, | | | 4x Nvidia | 12 | | | | | 2400 MHz | | 6GB/s | | Intel Xeon 5118 | yes | | | gtx1080ti | 56Gb/s | | GTX 1080Ti | | | | | | | HDD | | 12 cores, 2.3-3.2GHz | | | | | | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ | gpu00[6-9] | 4 | | 192 GB | | 1.5 TB | | 2x AMD EPIC 7302 | | 128 | 6GB | | | | 3x Nvidia | 16 | | | | | 3200 MHz | | 12GB/s | | 16 cores, 3-3.3GHz | yes | | | hyperthreading,192go, | 100Gb/s | | RTX8000 48GB| | | | | | | NVME | | | | | rtx8000 | | | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ | gpu011 | 1 | | 128 GB | | 200 GB | | 2x Intel Xeon | | 28 | 4GB | | | | 4x Nvidia | 6 | | | | | 2400 MHz | | 6 GB/s | | E5-2650L v4 | yes | | | hyperthreading,128go, | 56Gb/s | | RTX2080ti | | | | | | | HDD | | 14 cores, 1.7-2.5GHz | | | | rtx2080ti | | | 12GB | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ | gpu01[2-3] | 2 | | 256 GB | | 3.6 GB | | AMD EPYC 7543P | | 56 | 4GB | | | | 4x Nvidia | 14 | | | | | 3200 MHz | | 12 GB/s | | 32 cores, 2.8GHz | yes | | | hyperthreading,256go, | 100Gb/s | | A100 | | | | | | | HDD | | | | | a100 | | | 80GB | | +----------------+-------------+-------------+---------------+------------------------+-------------+------------+--------------+-------------------------+-------------+---------------+-----------------+ .. [*] Maximum number of cpu you can allocate with ``--cpus-per-task`` per allocated GPU card. You'll notice that some nodes have the hyperthreading activated. It means that you can allocate twice as many logical cores (threads) as there are physical cores on these nodes. For example for node001 to node020, you can allocate a maximum of 64 logical cores. CLEPS Partitions ---------------- When you submit a job with ``srun`` or ``sbatch``, you submit it into a partition (like a queue). Nodes into a partition share common purpose or configuration. To specify a partition when submitting a job, add the ``-p`` or ``--partition`` option, followed by the name of the partition. .. code-block:: console # To submit your job into cpu_homogen partition srun -N 2 -p cpu_homogen .. code-block:: console $ cat .batch #!/bin/bash #SBATCH --partition=cpu_homogen ... If none is specified, you job will run in the default partition ``cpu_devel``. .. cssclass:: table-striped +----------------+-------------+-------------+------------------------+ | | partition | nodes | | jobs max | | | | name | | | duration | | purpose/configuration| | | | | | +================+=============+=============+========================+ | | | | | Tests, compilations | | cpu_devel | node021-056 | 1 week | | and small jobs | | | | | | +----------------+-------------+-------------+------------------------+ | | | | | Homogeneous set | | cpu_homogen | node001-020 | 1 week | | of nodes. Suited | | | | | | for scaling studies | | | | | | of MPI jobs | +----------------+-------------+-------------+------------------------+ | | | | | Nodes equiped | | gpu | gpu001-009, | 2 days | | with GPUs | | | gpu01[1-3] | | | +----------------+-------------+-------------+------------------------+ | | | | Large memory node | | mem | mem001 | 2 days | | | | | | | +----------------+-------------+-------------+------------------------+ | | | | | GPU node, ALMANACH | | \*almanach | gpu009 | 2 days | | priority | | | | | | +----------------+-------------+-------------+------------------------+ | | | | | GPU node, WILLOW | | \*willow | gpu01[2-3] | 2 days | | priority | | | | | | +----------------+-------------+-------------+------------------------+ .. warning:: Projects have the possibility to buy computing resources and to include them into the CLEPS infrastructure. They benefit from the whole infrastructure and mechanisms such as scheduling. They can also benefit a priorirty access on their resources, while letting them accessible to users from other projects when not used. This Slurm mechanism is known as **job preemption**. Such resources are therefore present in two different **partitions**. The generic one that makes them available to everyone, and a higher priority one, only available to the members of the project that funded the resources. Such higher priority partition are marked with a `*` in the table above. The ``gpu`` partition is currently the only one concerned. Be aware that submitting in this partition could start your job on a `proprietary` resource, also included in either the `almanach` or `willow` partition. If you don't want to take the risk of beeing preempted by a higher priority job, you can explicitely exclude `proprietary` nodes from your allocation request with the ``--exclude`` option. Example: .. code-block:: console srun -p gpu --exclude=gpu009,gpu01[2-3] ... will exclude nodes gpu009, gpu012, gpu013 from your allocation. If you belong to a team that benefit a prioritary access to some hardware, you have to specify both your **partition** AND **account**. I.e. for the members of ALMANACH team: .. code-block:: console srun -p almanach -A almanach [options] Node features ~~~~~~~~~~~~~ In the table :ref:`comp_nodes`, you'll notice a column ``Features``. This column ensure the possibility to target nodes with certain caracteristics in a partition. Example: You want to target nodes with AMD processors in the cpu_devel partition (default partition): .. code-block:: console srun --constraint=amd ... See the `Slurm documentation `_ for more information. CLEPS Storage ----------------- The :code:`/home` path ~~~~~~~~~~~~~~~~~~~~~~~ Your :code:`/home` path is the prefered place to compile your code and do small development tasks. It is **backuped** so it also a good place to store important data. It is accessible with the `$HOME` environment variable. * Capacity and quotas This partition is 9TB xfs filesystem and your disk space quota is set at 100GB. To check your disk usage: .. code-block:: console cd du -sh . The :code:`/scratch` path ~~~~~~~~~~~~~~~~~~~~~~~~~~ The scratch partition is a Lustre parallel filesystem and thereby designed to support **large-file** parallel IO. It is **not backuped** so it is not the right place to leave important data. It is accessible with the `$SCRATCH` environment variable. * Capacity and quotas There are currently 500To available and a project quota of 20To is applied to each GID(=Team/EPI/service). If you need more space, you can contact directly the support via the `helpdesk `_. To check your project quota status: .. code-block:: console # First, get your project ID grep /etc/lustre-projectid-gid | cut -c1 lfs quota -ph /scratch Lustre offers many tuning parameters to increase performances even for small files. Check the :ref:`file_striping` page to know how to tune your :code:`/scratch` tree. Two special folders are available on the `/scratch` partition: * `/scratch/_projets_/`, a folder shared by all the members of a project, accessible with the `$PROJECT` environment variable. * `/scratch/_public_`, a read-only folder accessible to every user. It can be used to store large data shared by several projects. An explicit demand must be done for the admin to write the data. It is accessible with the variable `$PUBLIC`. The :code:`/local` path ~~~~~~~~~~~~~~~~~~~~~~~ As the name suggests, this place is local to each node. It can be accessed only while a job is running with the variable $TMP_DIR. You can see how much space is available on each node in the :ref:`comp_nodes` section. .. warning:: This folder (:code:`/local`) is a **temporary** storage solution, available at the scale of a running job. Once your job is over, all your data are erased.