Lustre filesystem
The Lustre filesystem is a parallel filesystem and thereby designed to support large-file parallel IO.
The main feature leading to high performance for this filesystem is its ability to stripe data across multiple storage targets (OSTs). This means that you can split your files into chunks that will be stored on different OSTs. This striping process allows mainly two different things:
Sharing disk usage between different physical resources.
Accessing portions of files located on different OSTs at the same time, resulting in an increased bandwidth.
Architecture

As you can see on the above picture, Lustre separates the data from the metadata. The metadata, which hold information such as data location, size of the data, permissions, …, are stored on a Metadata Server (MDS). Data are stored on set of disks called Object Storage Targets (OSTs), which are managed by Object Storage Servers (OSSs). At the time of these lines, CLEPS Lustre filesystem includes 3 OSSs and a total of 15 14TiB OSTs.
File striping
As explained above, Lustre has the ability to split files into chunks of data (stripes). For a given file, the way these chunks are distributed over the OSTs is called the file’s layout or striping.
The default striping on /scratch uses a feature called Progressive File
Layout
. It means that the striping will change according to the file size.
File size |
0 - 512MiB |
512MiB - 40GiB |
More than 40GiB |
---|---|---|---|
stripe_count |
1 |
4 |
Number of available OSTs |
stripe_size |
1MiB |
4MiB |
64MiB |
This striping is set on the /scratch folder and inherited by any subdirectories.
It should provide close to optimal performance in most use case. Nevertheless
you can overwrite this default setting and set you own striping on a file
or directory with the lfs setstripe
command:
lfs setstripe [--size|-s stripe_size] [--count|-c stripe_cnt]
[--index|-i|--offset|-o start_ost_index]
[--pool|-p <pool>] ...
<dirname|filename>
Parameter |
Description |
---|---|
-S stripe_size |
Number of bytes to store on an OST before moving to the next OST. A stripe_size of 0 uses the file system’s default stripe size, (default is 1 MB). Can be specified with k (KB), m (MB), or g (GB). |
-c stripe_count |
Number of OSTs over which to stripe a file |
-i start_ost_index |
OST index (starting at 0) on which to start striping of the file. Default is -1, let the MDS choose (recommended) |
Examples:
# Create a file with default layout of the directory
touch testfile
# Create a default layout for all files in directory
lfs setstripe -c 4 -S 1M
# Create a file striped on two OSTs
lfs setstripe -c 2 -S 4M testfile2
# Create a file on a single OST
lfs setstripe -c 1 2M testfile3
In the last example, testfile3 is striped over a single OST and will be written by chunks of 2MiB. You’ll notice that the file size isn’t known at creation time, lfs setstripe only set the layout of the file.
The lfs getstripe
command lists striping information of a file or directory:
lfs getstripe [--quiet|-q] [--verbose|-v]
[--count|-c] [--index|-i | --offset|-o]
[--size|-s] [--pool|-p] [--directory|-d]
[--recursive|-r] ...
<dirname|filename>
Parameter |
Description |
---|---|
-q |
Allocation information, not layout |
-c |
List OSTs index |
-s |
Stripe size (how much data is written on one OST before moving to the next) |
-p |
List the pools to which a file belongs |
-d |
Print layout information of files in a directory |
-r |
Recurse into all subdirectories |
Stripe alignment
If you want to benefit striping with parallel accesses to a file, you need to ensure that the different processes of your program read/write part of a file located on different OSTS. This will prevent concurrent accesses and ensure optimal performance.
You can find more information about stripe alignment [here](https://www.nics.utk.edu/computing-resources/file-systems/io-lustre-tips#stripe-alignment).
External documentation
For those who are interested in more details about how to manage file striping, here are some useful links:
Nasa Lustre filesytem documentation and especially the best practices part.