Fast Lustre File System find and disk usage Tools#
The basic unix commands du and find can take minutes to hours on the lustre file system.
As this is not practicable for large projects we provide the two fast alternatives lustre_find and lustre_du taking a few seconds to a few minutes for large searches. For reasons of basic data privacy, your searches are limited to your own projects under /work.
While [luv](https://luv.dkrz.de) already points out which fraction of files are unused for quite some time, these tools come in handy to find out where these are actually located.
Note
These tools query a dedicated metadata database in the background. The database gets updated nightly. If you have created, modified, or deleted files during the day, those changes will not appear until after the next update.
All interaction is done via the command-line commands described below, direct database access is neither required nor recommended.
Available Tools#
The following commands are available:
Command |
Purpose |
|---|---|
|
Search for files and paths matching a pattern |
|
Compute directory usage (sizes, file counts, etc.) |
Each command supports --help for an overview of usage and options.
Common Features#
Works on projects under
/work.Wildcards follow the standard shell (
GLOB) syntax:*matches any sequence of characters?matches a single character
Output can be saved as
.parquet,.csvor.json.Filtering supports DuckDB SQL predicates, e.g.
"size > 1e6 AND atime_ms < '2025-11-01'"(needs both quotes).
The following fields can be added via --add or used in --filter:
Field |
Description |
|---|---|
|
File size in bytes |
|
Change time (inode metadata change), in milliseconds, accepts dates as quoted ‘YYYY-MM-DD’ strings |
|
Modification time (content change), in milliseconds |
|
Last access time, in milliseconds |
|
Creation time, in milliseconds |
|
Full absolute path |
|
Inode number (unique per filesystem) |
|
User ID of the file owner |
|
Group ID of the file owner |
|
Project ID used for project-based quotas |
|
File type and permissions (POSIX |
lustre_find#
lustre_find locates paths or files matching a given pattern. It behaves
similar to find on Linux but is backed by the Lustre metadata database,
which is significantly faster for large directories.
Examples
Warning
When using wildcards (*, ?), quote the pattern to avoid shell
interpretation.
You can only search within your own projects under /work.
# Find all .nc files in a project
lustre_find "/work/ik1017/CMIP6/data/CMIP6/*.nc" # adjust to one of your projects
# Add extra columns to the output (e.g. uid, gid)
lustre_find "/work/ik1017/*.nc" --add uid --add gid
# Filter by file size and access time using SQL. Note the quotes!
lustre_find "/work/ik1017/*.nc" \
--filter "size > 5e6 AND atime_ms >= '2025-06-15'"
# Change the number of rows printed
lustre_find "/work/ik1017/*.nc" --max_rows 50
# Save result to a parquet file
lustre_find "/work/ik1017/*.nc" --save /your_path_to_store_result/results.csv
Key Options
Option |
Description |
|---|---|
|
GLOB-like path (use |
|
Project name (e.g. |
|
Show additional metadata columns (e.g. |
|
SQL predicate, e.g. |
|
Show at most N rows |
|
Save results to |
lustre_du#
lustre_du provides fast, directory-level summaries similar to the UNIX
du command. You can view total size, number of files, or aggregated
metadata for any directory prefix.
Examples
Warning
You can only search within your own projects under /work.
# Show total size of a directory in /work
lustre_du /work/ik1017/ # adjust to one of your projects
# Limit display to largest subdirectories
lustre_du /work/ik1017/ --max_rows 20
# Get additional information through added columns (e.g. uid)
lustre_du /work/ik1017/ --add uid
Key Options
Option |
Description |
|---|---|
|
Path in |
|
SQL predicate applied before counting, e.g. |
|
Additional columns to aggregate (repeatable). Allowed aggs: |
|
Limit number of rows shown for subdirectories |
|
Save results; directory (Parquet) or file path with |