Getting Started with slk#

file version: 20 June 2025

current software versions: slk version 3.3.91; slk_helpers version 1.16.4; slk wrappers version 2.4.0

Introduction#

slk and slk_helpers are the command line interfaces that have to be used to interact with the HSM system StrongLink (tape archive) at DKRZ. slk can be used to archive data to tape and to list, search and retrieve archived data on/from tape. It is provided by StrongLink. The slk_helpers provide additional functionality and are developed and maintained at DKRZ. Before you can use the slk and/or slk_helpers commands the first time, you must call slk login and login with your DKRZ/luv login credentials.

The slk and slk_helpers commands behave very similar to known commands on the Linux terminal. Not all slk and slk_helpers commands are listed below. Please have a look into the slk manual and slk_helpers manual to get lists of all available commands.

cp => slk archive and slk_helpers retrieve
ls => slk list and slk_helpers list_search
rm => slk delete
mv => slk move and slk rename
chmod => slk chmod
chgrp => slk group
chown => slk owner (only admins)
find => slk search
mkdir => slk_helpers mkdir (slk archive automatically creates namespaces)
sha512sum => slk_helpers checksum -t sha512
test -e => slk_helpers exists (also returns the resource id)
get file size, e.g. equal to stat --printf="%s" => slk_helpers size

folder or directory is called namespace in StrongLink.

Please do not run slk archival and slk retrieval on Levante login nodes if your plan to transfer more than a few GB. Instead, please run them on the shared or interactive nodes as described here and here, respectively, and request 6GB of memory (--mem=6GB). If your slk is killed with a message like /sw/[...]/bin/slk: line 16: [...] Killed, then then your slk was killed due to too less available memory.

Some slk commands do not print textual output to the SLURM log when used in batch jobs. Therefore, we strongly recommend capturing the exit code $? of slk as described in the slk in SLURM jobs. In SLURM jobs, slk archive prints its double verbose output (-vv) to the SLURM log but not its single verbose output (-v). Additionally, some slk and slk_helpers commands print a summary report to the slk log file (~/.slk/slk-cli.log) which contains information on whether the commands finished successfully or not.

Start using slk and slk_helpers#

Load slk module#

The slk and slk_helpers are available as module on all Levante nodes. Just do

$ module load slk

to load the most recent program versions. In the past, slk and slk_helpers were available in separate modules. These modules have been merged. The new module contains three components:

slk
slk_helpers
sbatch wrapper scripts for slk and slk_helpers

If you wish to use a specific old release (e.g. slk_helpers version 1.10.2), please do

$ module load slk

This will load slk version 3.3.91, slk_helpers version 1.10.2 and the sbatch wrappers version 1.2.1.

Please be aware that old slk and slk_helpers releases are not officially supported anymore, contain more bugs and might not be 100% compatible with the current version of StrongLink.

Login#

Call slk login and use your common DKRZ credentials to log in. You do not login to another shell with slk login (like pftp) but it creates a login token. This login token allows the usage of the other slk commands. It is located in ~/.slk/config.json and is valid for 30 days.

$ slk login

All Levante nodes share the same home directory. Therefore, you need to login only once every 30 days and you can run it on any Levante node.

Reminder login token expires#

If you wish to be reminded when the login token is due to expire, you can set up SLURM job to check the expiration data of the login token on daily basis. You might use this script for this purpose. It sends an email to you when the token expires in 30, 5, 4, 3, 2 and 1 days. The email is send to email address which is stored in your user profile at https://luv.dkrz.de . The whole output is written into the file slk_token_expire_check.log which is automatically kept below a size of 1100 lines. When finished, the script submits itself to start again the next day at the same time. Please submit the script this way replacing YOUR_PROJECT_ACCOUNT by a valid project/account with compute time:

sbatch --begin="now+1day" --account=YOUR_PROJECT_ACCOUNT ./slk_token_expire_email.sh

slk in SLURM jobs#

Please capture the exit code ($?) of each slk command which is run in a SLURM/batch job because some slk commands do not print textual output when used in batch jobs (see How do I capture exit codes?; example: example batch scripts).

Please run slk archive, slk retrieve (deprecated) and slk_helpers retrieve with 6 GB of memory (--mem=6GB in batch jobs). Detailed diagnostic information is printed into the slk log file: ~/.slk/slk-cli.log. If your slk is killed with a message like /sw/[...]/bin/slk: line 16: [...] Killed, then please inform the DKRZ support (support@dkrz.de) and re-run the slk command with 8 GB or 10 GB of memory.

slk memory footprint#

slk archive, slk retrieve (deprecated) and slk_helpers retrieve might use up to 4 GB of memory and cause high CPU load when large files or many files are archived/retrieved. Running these tasks on the login nodes might affect other users working on these nodes. Hence, we encourage all users to perform these tasks on nodes of the compute, interactive or shared partitons and with sufficient amount of memory allocated (--mem=6GB; see Batch Script Examples).

slk_helpers retrieve can automatically submit a SLURM job (--run-as-slurm-job-with-account) and we provide wrapper scripts for retrievals which we encourage you to use instead of slk retrieve. TODO

slk data transfer rate#

Transferring files with a single call of slk archive, slk retrieve (deprecated) or slk_helpers retrieve reaches transfer rates of up to 1 GB/s and might even slightly exceed this rate under favorable conditions. The transfer rate can be lower if the StrongLink constellation is under high load. Running _several_ transfer commands in parallel on one node will not necessarily increase the total transfer rate because at some point the transfer rate is limited by the hardware. Please make sure to allocate sufficient memory. Generally, please avoid running too many retrieval processes in parallel but try to bundle them. When you need more then five files from the tape archive, please use our new recall/retrieve watcher scripts described here.

Archival#

The slk archive command is available on all nodes of Levante and can be used to archive files or directories. Similar to cp it prints not output by default. A progress bar is printed when -v is set in interactive mode. A detailed list of all processed files is printed when -vv is set (also in batch mode). slk archive allows the usage of wildcards *, ? and [...]. Add -R to do recursive archival of directories. Basic examples slk archive calls are:

# archive one file, absolute path
$ slk archive /work/bm0146/k204221/some_file.nc /arch/ab01234/c567890/my_data_1/
# archive one file, relative path
$ slk archive some_file.nc /arch/ab01234/c567890/my_data_3/
# archive folder recursively, absolute path
$ slk archive -R /work/bm0146/k204221/some_folder /arch/ab01234/c567890/my_data_4/
# archive folder recursively, relative path, skip hidden files and folders
$ slk archive -x -R some_folder /arch/ab01234/c567890/my_data_5/
# archive multiple files
$ slk archive some_file_a.nc some_file_b.nc some_file_c.nc /arch/ab01234/c567890/my_data_6/
# archive multiple files using wildcards
$ slk archive file_?.nc /arch/ab01234/c567890/my_data_7/
$ slk archive year200[0123].nc /arch/ab01234/c567890/my_data_8/

Please run it via salloc as interactive job in partition interactive or in a batch script in partition shared if you plan to archive a file larger than a few GB. In both cases, 6 GB of memory should be allocated (--mem=6GB).

The optimal file size is between 10 GB and 200 GB. Each file takes up a least 1 GB of archival quota. Retrieving files smaller than 1 GB is very inefficient and, hence, we strongly recommend not to archive smaller files. Instead, we recommend to pack small files to tar-balls of the optimal size. packems is very useful for this purpose. Hidden files and folders, such as .git or .config, are ignored when -x is set. Please avoid archiving more than approximately 3 TB with one call of slk archive. Archiving higher amounts of data at once might cause the slk archive to fail (details: How much data can I archive at once?).

When you run slk archive with -vv it will print the archival status of each file it tried to archive.

The group of an archived file is always the default group of the archiving user and not the group of the project folder into which the file is archived. The owner has to change the group manually after archival.

Please check if slk archive finishes successfully. This can be done by one of three means:

look into the textual output of slk archive,
capture the exit code $? (0: successful archival)
look into the slk log file: ~./slk/slk-cli.log

If the transfer via slk archive is interrupted, an incomplete version of this file remains in the HSM. If you re-run the same slk archive command it transfers only incomplete (overwrites them) and missing files. Existing files, which were archived successully already, are skipped. Thus, slk archive works like rsync with this respect.

Incomplete files are listed by slk list and are not deleted automatically by the StrongLink system. Please do not trust the partial file info printed by slk list but run slk_helpers has_no_flag_partial to generate a complete list of such files. Most files marked as partial file are actually completely archived and are falsely flagged. When repeated slk archive -vv ... indicate that these files are skipped but the partial file indicator remains, please notify us via support@dkrz.de to perform further checks and to remove this indicator.

In order to check whether a file actually was completely archived or not, please run a verify job. It checks whether the actual sizes of the files match the expected sizes. Files which are on tape have succeeded an internal verify job. Once or twice a week the DKRZ staff searches for incompletely archived files, informs the file owners and moves these files to a specific location. Files which are falsely marked as partial file are ignored in this process.

Verify jobs detect no bit flips or other defects which do not affect the file size. To catch these, please compare the sha512 checksums of your archived files. StrongLink calculates sha512 checksums directly after archival or while a file is written to tape – depending on the general load of the StrongLink system.

Pack & Archive#

The package packems, which was developed by MPI-M and DKRZ for the HPSS, has been adapted to the StrongLink system / slk. The process of packing & archiving of multiple data files to tape and their retrieval is simplified by this package. It consists of three command line programs:

a pack-&-archive tools packems,
a list-archived-content tool listems,
a retrieve-&-unpack tool unpackems and

Currently, there is a low probability that slk archive and other slk commands fail with connection timeouts or out of memory errors. When this happens, parts of the packems workflow may fail – and failure might be overlooked. Therefore, we propose to run packems twice: first, to generate tar balls and, second, to archive the tar balls and to generate an index. The second run of packems needs sufficient memory being allocated as mentioned in the comments below. This would be an example workflow:

# set some variables for better understanding
# TODO: please set these three environment variables according to your needs
source_path=/work/ab1234/c567890/scenario_xyz/output
tmp_path=/scratch/c/c567890/packems_tmp
destination_path=/arch/ab1234/model_output
tar_base_name=scenario_xyz

# load packems module
module load packems slk

# pack data with packems (without archival)
# if you are in a SLURM job on a compute node you can increase `-j 2` to higher values -- e.g. `-j 16`
# TODO: please add additional flags to the packems call according to your needs
packems -j 2 ${source_path} -d ${tmp_path} -o ${tar_base_name} --no-archive

# archive packed data with packems (packing already done previously)
# NOTE: Each running archival process needs approx. 6 GB of memory. This means:
#           * do not run packems on login nodes
#           * allocate sufficient memory with `--mem=<memory>`
#           * set `-j <parallel tasks>` accordingly (`<memory> = <parallel tasks> * 6 GB`)
packems -j 1 ${source_path} -d ${tmp_path} -o ${tar_base_name} --archive-only

For details on the usage of packems please have a look into the packems manual.

List content#

slk list will print the content of a namespace similar to ls -la. The file size is automatically printed human-readably. Please set -b to print the file sizes in bytes.

$ slk list /arch/bm0146/k204221/my_data
-rw-rw-rw-- k204221     bm0146          1.2K   27 Mar 2020  13:18  borehole_01.nc
-rw-rw-rw-- k204221     bm0146          1.2K   04 Mar 2021  10:13  nc3.nc
-rw-rw-rw-- k204221     bm0146          1.2K   04 Mar 2021  09:29  nc_k_2.nc
-rw-rw-rw-- k204221     bm0146          4.0M   04 Mar 2021  17:22  nc_k_3.nc
-rw-rw-rw-- k204221     bm0146          4.0M   04 Mar 2021  10:02  nc_k_4.nc
-rw-rw-rw-- k204221     bm0146         13.1K   08 Dec 2020  22:29  small.nc
-rwxrwx-w-t k204221     bm0146        105.5M   08 Nov 2019  09:05  small_BPb4-Sl-mT_00062104_00040000000_01040000000.AGM07807972.freeze.nc
-rwxrwx-w-t k204221     bm0146        105.5M   14 Nov 2019  03:23  small_BPb4-Sl-mT_00062104_00040000000_01040000000.AGM07807972.nc
-rw-rw-rw-- k204221     bm0146          1.2K   23 Mar 2021  22:41  test.nc
-rw-rw-rw-- k204221     bm0146          1.2K   23 Mar 2021  20:04  zonk.nc
-rw-rw-rw-- k204221     bm0146          1.2K   28 Jun 2021  14:26  some_file.nc
Files: 11

Validate archivals#

This content has been moved to section Validate archivals on the page Archivals to tape.

Retrievals#

Warning

We discourage the usage of slk retrieve and slk recall. Please use slk_helpers retrieve, slk_helpers recall and the retrieve/recall watcher scripts instead.

If you want to access files from the tape archive, we considerably recommend to split recall (transfer: tape to cache) and retrieve (transfer: cache to you). The recalls are managed by a job scheduler within StrongLink similar to SLURM jobs on Levante. Thus, you submit a recall job and come back later to collect the files. The retrievals, instead, require slk / slk_helpers to actively transfer the data via the Levante node on which you run the retrieval command. While a file is being transfered it has a temporary filename [filename][transfer-id]slkretrieve. After successful retrieval, it is renamed.

Note

We provide different tools for retrieving up to 4 files and more than four files. This is because StrongLink does not efficiently organize multiple recall jobs targeting multiple tapes. If you need 40 files, please do not split your request into ten requests of four files each or into 40 requests of one file each. This might cause the whole StrongLink to become slow for all users. The scripts we provide for retrieving more than four files split your request up into subrequests and submit them time-delayed to allow StrongLink to process them efficiently. Examples are given below.

When you need to retrieve up to 4 files, please use slk_helpers recall and slk_helpers retrieve. You might also use slk retrieve which does both in one step but which accepts only one source file/path and has bad error reporting. When you run slk retrieve or slk_helpers retrieve, please not do it on Levante login nodes because the file transfer requires much memory. Instead, please do it via a batch job on the shared partition or via an interactive batch session on the interactive partition (Run slk in the “interactive” partition). Alternatively, slk_helpers retrieve can generate and submit a SLURM job script automatically.

Example: Basic retrieval workflow for two files.

# check if the files are already in the cache:
$ slk_helpers iscached /arch/ab1234/file01.nc /arch/ab1234/file02.nc -v
# * command lists not-cached files (``-v`` turns this on)

# if files are not cached:
# start recall => copy files from tape to cache
$ slk_helpers recall /arch/ab1234/file01.nc /arch/ab1234/file02.nc -d /work/ab1234/my_data
# * returns a job ID; please write down this job id
# * files, which are already in the cache, are skipped
# * -d <destination> is optional but will prevent that files
#       are recalled which are already in the destination location

# start retrieval SLURM job => copy files from cache to you
$ slk_helpers retrieve /arch/ab1234/file01.nc /arch/ab1234/file02.nc -d /work/ab1234/my_data -v --slurm ab1234
# * can be run directly after recall command
# * this will submit a SLURM job using the compute time account / project 'ab1234'
# * the job will run repeatedly until all requested files are available in the destination
#       specified by ``-d <destination>``
# * automatically generates a log file with details; name of log file is printed to the terminal
# * while the transfer is running, the file has a temporary name in the destination location:
#       "<filename><id>slkretrieve"

# if no file transfer started after 10 minute, you should check the job status:
# check job status:
$ slk_helpers job_status <job id>
# if SUCCESSFUL or COMPLETED: retrieval to you should start soon
# if PROCESSING: recall is still running
# if QUEUED: recall is waiting in the StrongLink queue

All possible stati of recall jobs are listed here

When you need to retrieve more than four files, please use our recall / retrieve “watcher” scripts which organize the tape access efficiently and use SLURM jobs automatically.

Example: Basic retrieval workflow for a folder/namespace or 50 files.

# create a folder where multiple files, which are required
# for organizing the recall/retrieval, will be created
mkdir example_folder
cd example_folder

# generate files required by "watchers"
slk_helpers init_watchers /arch/ab1234/file01.nc /arch/ab1234/file02.nc \
        /arch/ab1234/file03.nc /arch/ab1234/file04.nc /arch/ab1234/file05.nc \
        /arch/ab1234/file06.nc /arch/ab1234/file07.nc /arch/ab1234/file08.nc \
        /arch/ab1234/file09.nc -d /work/ab1234/my_data -ns
# command does the same as:
#   slk_helpers gfbt /arch/ab1234/file01.nc ... /arch/ab1234/file09.nc -wf1 /work/ab1234/my_data -v
#
# The command checks for each file on what tape it is stored on and puts all files,
# which are on one tape, into an individual list. These lists are stored in the
# current folder.

# let the "watchers" run ('ab1234' is a DKRZ compute time project):
slk_helpers start_watchers ab1234
# This command will start two scripts which will submit one SLURM job each:
#  * one recall watcher
#  * one retrieval watcher

# check the recall status:
less recall.log
# The recall watcher will keep up to four recall jobs running in parallel.

# check the retrieval status:
less retrieve.log
# The retrieve watcher will try to get a file as soon as the recall watcher
# submitted a recall job. Therefore, the retrieve watcher might report, that
# files are not cached yet.

# stop watchers
slk_helpers stop_watchers

You can also write all files you need into one file and hand it over to slk_helpers init_watchers. While slk_helpers recall, slk_helpers retrieve and a few other commands allow content to be piped into them, slk_helpers init_watchers and slk_helpers gfbt do not allows this. Please do this instead – assuming your file list is stored in file_list.txt:

slk_helpers init_watchers `cat files_list.txt | tr '\n' ' ' | tr -d '^M'` -d /work/ab1234/my_data -ns

For details and more examples, please check out the retrievals doc page.

Move, rename and delete files or namespaces#

Move a file from one namespace to another. The file’s name cannot be changed by this command.

$ slk move /arch/bm0146/k204221/my_data/nc_k_2.nc /arch/bm0146/k204221/old_data

Warning

slk move automatically overwrites target files if they exit (similar to mv). Please run slk move with -i avoid this.

Rename a file. The file’s location cannot be changed by this command.

$ slk rename /arch/bm0146/k204221/my_data/nc_k_3.nc a_netcdf_file.nc

Delete a file. If we apply slk delete onto a namespace then it deletes all files in this namespace without confirmation. Files in sub-namespaces are not deleted. To delete the whole namespace with all sub-, subsub-, …-namespaces and their content, please append -R. You may supply more than one file to slk delete to delete more than one file at once.

$ slk delete /arch/bm0146/k204221/my_data/zonk.nc

Search files#

Files stored in StrongLink can be searched via search queries, which are written in JSON. You can use jq to print the search queries in a human-readable way. Details on the format of search queries are given in Metadata in StrongLink.

The slk provides search functionality via the command slk search. After running a search query, it returns a search ID that can be used with slk list to list the found data. The commands slk_helpers recall, slk_helpers retrieve and slk_helpers init_watchers accept search ids as input when --search-id is set. Below you will find three example tasks for searches. More examples and explanations are given here: Metadata in StrongLink.

Task: find all *.nc files in the namespace /ex/am/ple/data

slk search '{"$and": [{"path": {"$gte": "/ex/am/ple/data"}}, {"resources.name": {"$regex": ".nc$"}}]}'

Task: find all netCDF files, which have a global attribute project with the value ABC. For netcdf.Project below please note that netcdf is written lower case and Project starts with an upper case P although the global attribute might have been written with lower case p.

slk search '{"netcdf.Project": "ABC"}'

Task: find all INDEX.txt files that either belong to user k204221 (uid: 25301) in /arch or that are stored in the namespace /double/bm0146.

slk search '{"$and": [{"resources.name": "INDEX.txt"}, {"$or": [{"$and": [{"resources.posix_uid": 25301}, {"path": {"$gte": "/arch"}}]}, {"path": {"$gte": "/double/bm0146"}}]}]}'

The slk_helpers provide the command gen_file_query with which simple file queries can be generated. The query for the first task could be generated as follows:

slk_helpers gen_file_query '/ex/am/ple/data/.nc$'

Run slk in the “interactive” partition#

If you want to archive/retrieve large files or many files interactively, please use slk archive / slk_helpers retrieve via the interactive partition and allocate 6 GB of memory. This is done by running this command on one of the login nodes of Levante:

salloc --mem=6GB --partition=interactive --account=YOUR_PROJECT_ACCOUNT

Your user account has to be associated to a project account (YOUR_PROJECT_ACCOUT) which as compute time available. Please contact support@dkrz.de if you are not member in such a project but need to use salloc or sbatch for data retrieval or archival.

Run slk as batch job#

A “batch job” denotes a script or a program which is running on one of the Levante compute or shared nodes. The script/program was submitted via sbatch to the SLURM resource manager.

sbatch MY_JOB_SCRIPT.sh

Please always run slk jobs with 6 GB of memory allocated: either via a parameter to the sbatch command (--mem=6GB) or in the header of the batch script (#SBATCH --mem=6GB).

You can check the status of your jobs (queued, running, finishing, …) via

squeue -u $USER

Details on the usage of sbatch and similar SLURM-related commands are given in SLURM Introduction. Exemplary batch scripts for archival and retrieval a given in the sections Archival script templates and Retrieval script templates, respectively.

Debugging#

Please have a look into the slk log file in ~/.slk/slk-cli.log for detailed error messages. If you send error reports or questions on failed slk calls to the DKRZ support, please attach your slk log.

pyslk#

Most slk and slk_helpers commands are available as functions pyslk.pyslk.slk_COMMAND(...). These functions are simple wrappers that print out the text, which the slk/slk_helpers commands normally print to the command line. A bit more advanced wrappers are available for a few commands via pyslk.parsers.slk_COMMAND_.... pyslk is installed on Levante in the latest python3 module. It can also be downloaded separately – however, needs slk to be installed. Details on the availability a listed here: pyslk availability.

A few usage examples:

> import pyslk
> pyslk.list('/arch')
    permissions    owner   group  size  day  month  year  filename
0   drwxrwxr-x-  a270003  aa0049         06   Aug   2021  10:17    aa0049
1   drwxrwxr-x-  a270003  aa0238         06   Aug   2021  07:04    aa0238
2   drwxrwx-w--  a270003  ab0036         06   Aug   2021  10:13    ab0036
3   drwxrwx-w--  a270003  ab0051         06   Aug   2021  11:56    ab0051
4   ...
> pyslk.version()
SCLI Version 3.3.21