Lustre filesystem ------------------ The Lustre filesystem is a parallel filesystem and thereby designed to support **large-file** parallel IO. The main feature leading to high performance for this filesystem is its ability to stripe data across multiple storage targets (OSTs). This means that you can split your files into chunks that will be stored on different OSTs. This striping process allows mainly two different things: * Sharing disk usage between different physical resources. * Accessing portions of files located on different OSTs at the same time, resulting in an increased bandwidth. Architecture ^^^^^^^^^^^^^ .. image:: ../img/ArchitectureLustre.png As you can see on the above picture, Lustre separates the data from the metadata. The metadata, which hold information such as data location, size of the data, permissions, ..., are stored on a Metadata Server (MDS). Data are stored on set of disks called Object Storage Targets (OSTs), which are managed by Object Storage Servers (OSSs). At the time of these lines, CLEPS Lustre filesystem includes 3 OSSs and a total of 15 14TiB OSTs. .. _file_striping: File striping ^^^^^^^^^^^^^^ As explained above, Lustre has the ability to split files into chunks of data (stripes). For a given file, the way these chunks are distributed over the OSTs is called the file's layout or striping. The default striping on /scratch uses a feature called ``Progressive File Layout``. It means that the striping will change according to the file size. +--------------+------------+----------------+--------------------------+ | File size | 0 - 512MiB | 512MiB - 40GiB | More than 40GiB | +==============+============+================+==========================+ | stripe_count | 1 | 4 | Number of available OSTs | +--------------+------------+----------------+--------------------------+ | stripe_size | 1MiB | 4MiB | 64MiB | +--------------+------------+----------------+--------------------------+ This striping is set on the /scratch folder and inherited by any subdirectories. It should provide close to optimal performance in most use case. Nevertheless you can overwrite this default setting and set you own striping on a file or directory with the ``lfs setstripe`` command: .. code-block:: console lfs setstripe [--size|-s stripe_size] [--count|-c stripe_cnt] [--index|-i|--offset|-o start_ost_index] [--pool|-p ] ... +--------------------+------------------------------------------------------+ | Parameter | Description | +====================+======================================================+ | -S stripe_size | Number of bytes to store on an OST before moving to | | | the next OST. A stripe_size of 0 uses the file | | | system’s default stripe size, | | | (default is 1 MB). Can be specified with k (KB), | | | m (MB), or g (GB). | +--------------------+------------------------------------------------------+ | -c stripe_count | Number of OSTs over which to stripe a file | +--------------------+------------------------------------------------------+ | -i start_ost_index | OST index (starting at 0) on which to start striping | | | of the file. Default is -1, let the MDS choose | | | (recommended) | +--------------------+------------------------------------------------------+ Examples: .. code-block:: console # Create a file with default layout of the directory touch testfile # Create a default layout for all files in directory lfs setstripe -c 4 -S 1M # Create a file striped on two OSTs lfs setstripe -c 2 -S 4M testfile2 # Create a file on a single OST lfs setstripe -c 1 2M testfile3 In the last example, testfile3 is striped over a single OST and will be written by chunks of 2MiB. You'll notice that the file size isn't known at creation time, lfs setstripe only set the layout of the file. The ``lfs getstripe`` command lists striping information of a file or directory: .. code-block:: console lfs getstripe [--quiet|-q] [--verbose|-v] [--count|-c] [--index|-i | --offset|-o] [--size|-s] [--pool|-p] [--directory|-d] [--recursive|-r] ... +-----------+-----------------------------------------------------------------------------+ | Parameter | Description | +===========+=============================================================================+ | -q | Allocation information, not layout | +-----------+-----------------------------------------------------------------------------+ | -c | List OSTs index | +-----------+-----------------------------------------------------------------------------+ | -s | Stripe size (how much data is written on one OST before moving to the next) | +-----------+-----------------------------------------------------------------------------+ | -p | List the pools to which a file belongs | +-----------+-----------------------------------------------------------------------------+ | -d | Print layout information of files in a directory | +-----------+-----------------------------------------------------------------------------+ | -r | Recurse into all subdirectories | +-----------+-----------------------------------------------------------------------------+ Stripe alignment ^^^^^^^^^^^^^^^^^ If you want to benefit striping with parallel accesses to a file, you need to ensure that the different processes of your program read/write part of a file located on different OSTS. This will prevent concurrent accesses and ensure optimal performance. You can find more information about stripe alignment [here](https://www.nics.utk.edu/computing-resources/file-systems/io-lustre-tips#stripe-alignment). External documentation ^^^^^^^^^^^^^^^^^^^^^^ For those who are interested in more details about how to manage file striping, here are some useful links: * `Lustre documentation `_ * `Nasa Lustre filesytem documentation `_ and especially the `best practices part `_.