Lustre filesystem

The Lustre filesystem is a parallel filesystem and thereby designed to support large-file parallel IO.

The main feature leading to high performance for this filesystem is its ability to stripe data across multiple storage targets (OSTs). This means that you can split your files into chunks that will be stored on different OSTs. This striping process allows mainly two different things:

  • Sharing disk usage between different physical resources.

  • Accessing portions of files located on different OSTs at the same time, resulting in an increased bandwidth.

Architecture

../_images/ArchitectureLustre.png

As you can see on the above picture, Lustre separates the data from the metadata. The metadata, which hold information such as data location, size of the data, permissions, …, are stored on a Metadata Server (MDS). Data are stored on set of disks called Object Storage Targets (OSTs), which are managed by Object Storage Servers (OSSs). At the time of these lines, CLEPS Lustre filesystem includes 3 OSSs and a total of 15 14TiB OSTs.

File striping

As explained above, Lustre has the ability to split files into chunks of data (stripes). For a given file, the way these chunks are distributed over the OSTs is called the file’s layout or striping.

The default striping on /scratch uses a feature called Progressive File Layout. It means that the striping will change according to the file size.

File size

0 - 512MiB

512MiB - 40GiB

More than 40GiB

stripe_count

1

4

Number of available OSTs

stripe_size

1MiB

4MiB

64MiB

This striping is set on the /scratch folder and inherited by any subdirectories. It should provide close to optimal performance in most use case. Nevertheless you can overwrite this default setting and set you own striping on a file or directory with the lfs setstripe command:

lfs setstripe [--size|-s stripe_size] [--count|-c stripe_cnt]
                          [--index|-i|--offset|-o start_ost_index]
                          [--pool|-p <pool>] ...
                          <dirname|filename>

Parameter

Description

-S stripe_size

Number of bytes to store on an OST before moving to the next OST. A stripe_size of 0 uses the file system’s default stripe size, (default is 1 MB). Can be specified with k (KB), m (MB), or g (GB).

-c stripe_count

Number of OSTs over which to stripe a file

-i start_ost_index

OST index (starting at 0) on which to start striping of the file. Default is -1, let the MDS choose (recommended)

Examples:

# Create a file with default layout of the directory
touch testfile

# Create a default layout for all files in directory
lfs setstripe -c 4 -S 1M

# Create a file striped on two OSTs
lfs setstripe -c 2 -S 4M testfile2

# Create a file on a single OST
lfs setstripe -c 1 2M testfile3

In the last example, testfile3 is striped over a single OST and will be written by chunks of 2MiB. You’ll notice that the file size isn’t known at creation time, lfs setstripe only set the layout of the file.

The lfs getstripe command lists striping information of a file or directory:

lfs getstripe [--quiet|-q] [--verbose|-v]
              [--count|-c] [--index|-i | --offset|-o]
              [--size|-s] [--pool|-p] [--directory|-d]
              [--recursive|-r]  ...
              <dirname|filename>

Parameter

Description

-q

Allocation information, not layout

-c

List OSTs index

-s

Stripe size (how much data is written on one OST before moving to the next)

-p

List the pools to which a file belongs

-d

Print layout information of files in a directory

-r

Recurse into all subdirectories

Stripe alignment

If you want to benefit striping with parallel accesses to a file, you need to ensure that the different processes of your program read/write part of a file located on different OSTS. This will prevent concurrent accesses and ensure optimal performance.

You can find more information about stripe alignment [here](https://www.nics.utk.edu/computing-resources/file-systems/io-lustre-tips#stripe-alignment).

External documentation

For those who are interested in more details about how to manage file striping, here are some useful links: