Getting Started with slk

file version: 14 June 2022

current software versions: slk version 3.3.21; slk_helpers version 1.2.4

Note

Please also read Known Issues before you use slk the first time.

Introduction

slk and slk_helpers are the command line interfaces that have to be used to interact with the HSM system StrongLink (tape archive) at DKRZ. The previous HSM system HPSS, which was deactivated in October 2021, had pftp as command line interface. pftp does not work anymore. slk can be used to archive data to tape and to list, search and retrieve archived data on/from tape. It is provided by StrongBox Data Solutions. The slk_helpers provide additional functionality and are developed and maintained at DKRZ. Before you can use the slk and/or slk_helpers commands the first time, you must call slk login and login with your DKRZ/luv login credentials.

The slk and slk_helpers commands behave very similar to known commands on the Linux terminal. Not all slk and slk_helpers commands are listed below. Please have a look into the slk manual and slk_helpers manual to get lists of all available commands.

  • slk archive and slk retrieve => cp (limited availability of slk retrieve, see Retrievals from tape)

  • slk list and slk_helpers list_search => ls

  • slk delete => rm

  • slk move and slk rename => mv

  • slk chmod => chmod

  • slk group => chgrp

  • slk owner => chown (only admins)

  • slk search and slk_helpers search_limited => find (slk search currently deactivated; please use slk_helpers search_limited)

  • slk_helpers mkdir => mkdir (slk archive automatically creates namespaces)

  • slk_helpers checksum -t sha512 => sha512sum

  • slk_helpers exists => test -e (also returns the resource id)

  • slk_helpers size => get file size, e.g. equal to stat --printf="%s"

What we know as a folder or directory is called namespace in StrongLink.

Note

slk retrieve is deactivated on the login nodes of Levante.

Warning

Please run archival/retrieval of large/many files on compute or interactive nodes and request 6GB of memory (--mem=6GB).

Warning

Please be aware that some slk commands do not print textual output to the SLURM log when used in batch jobs. Please capture the exit code $? of slk as described in the slk in SLURM jobs.

Load slk module

The slk is available as module on all Levante nodes. Just do

$ module load slk

to load the most recent slk version. If you wish to use a specific old release (e.g. slk version 3.3.21), please do

$ module load slk/3.3.21

slk memory footprint

slk archive and slk retrieve might use up to 4 GB of memory and cause high CPU load when large files or many files are archived/retrieved. Running these tasks on the login nodes might affect other users working on these nodes. Hence, we encourage all users to perform these tasks on nodes of the compute, interactive or shared partitons and with sufficient amount of memory allocated. In order to account for overhead and similar, 6 GB of memory should be allocated in the job script via --mem=6GB (examples: Batch Script Examples; --mem in sbatch Manual: https://slurm.schedmd.com/sbatch.html#OPT_mem).

slk data transfer rate

Transferring files with a single call of slk archive or slk retrieve reaches transfer rates of up to 1 GB/s and might even slightly exceed this rate under favorable conditions. The transfer rate can be lower if the StrongLink constellation is under high load. Running _several_ slk archive or slk retrieve in parallel on one node will not necessarily increase the total transfer rate because at some point the transfer rate is limited by the hardware. We are still lacking experience on how many parallel slk archive / retrieve are still reasonable. Generally, please avoid running too many individual parallel calls of slk retrieve but try to bundle the retrieval of individual files as suggested in search and retrieval of search results. If you run several calls of slk in parallel, please make sure to allocate sufficient memory.

slk in SLURM jobs

Please capture the exit code ($?) of each slk command which is run in a SLURM/batch job because some slk commands do not print textual output when used in batch jobs (see How do I capture exit codes?; example: example batch scripts).

Please run slk archive and slk retrieve with 6 GB of memory (--mem=6GB in batch jobs). Detailed diagnostic information is printed into the slk log file: ~/.slk/slk-cli.log. The command line output might be misleading in some situations as documented in the Known Issues.

Login

Call slk login and use your common DKRZ credentials to log in. You do not login to another shell with slk login (like pftp) but it creates a login token. This login token allows the usage of the other slk commands. It is located in ~/.slk/config.json and is valid for 30 days.

$ slk login

All Levante nodes share the same home directory. Therefore, you need to login only once every 30 days and you can run it on any Levante node.

Archival

slk archive can be used to archive files or directories. A progress bar is printed when slk archive is used in interactive mode. The exit code of slk archive should be captured and printed if the command is used in a batch script, because no output will be written into the SLURM job log. The variable $? holds the exit code of the preceding command (see How do I capture exit codes?; extended examples: Archival script templates and Retrieval script templates). An example slk archive call would be:

$ slk archive /work/bm0146/k204221/some_file.nc /arch/bm0146/k204221/my_data/

slk archive allows the usage of * as wildcard (but not ? or [...]). Add -R to do recursive archival of directories. slk archive works semi-recursive without -R (see Known Issues for details).

Warning

If slk archive (was) unforeseenly terminated, all archived files should be checked for completeness. Files, which are listed by slk list, are not necessarily complete.

List content

slk list automatically prints its findings in a pagination mode with 25 items per page. That means that you see the first 25 results and have to type Return/Enter to show the next 25 results. The pagination mode is deactivated when the output of slk list is piped (|) into another command – e.g. cat, less or more.

$ slk list /arch/bm0146/k204221/my_data | cat
-rw-rw-rw-- k204221     bm0146          1.2K   27 Mar 2020  borehole_01.nc
-rw-rw-rw-- k204221     bm0146          1.2K   04 Mar 2021  nc3.nc
-rw-rw-rw-- k204221     bm0146          1.2K   04 Mar 2021  nc_k_2.nc
-rw-rw-rw-- k204221     bm0146          4.0M   04 Mar 2021  nc_k_3.nc
-rw-rw-rw-- k204221     bm0146          4.0M   04 Mar 2021  nc_k_4.nc
-rw-rw-rw-- k204221     bm0146         13.1K   08 Dec 2020  small.nc
-rwxrwx-w-- k204221     bm0146        105.5M   08 Nov 2019  small_BPb4-Sl-mT_00062104_00040000000_01040000000.AGM07807972.freeze.nc
-rwxrwx-w-- k204221     bm0146        105.5M   14 Nov 2019  small_BPb4-Sl-mT_00062104_00040000000_01040000000.AGM07807972.nc
-rw-rw-rw-- k204221     bm0146          1.2K   23 Mar 2021  test.nc
-rw-rw-rw-- k204221     bm0146          1.2K   23 Mar 2021  zonk.nc
-rw-rw-rw-- k204221     bm0146          1.2K   28 Jun 2021  some_file.nc
Files: 11

Validate archivals

This content has been moved to section Validate archivals on the page Archivals to tape.

Retrieve a file or namespace

slk retrieve is the counterpart of slk archive. It behaves quite the same as slk archive (progress bar; -R) but the * works only for files and not for namespaces. Currently (May 2022), slk retrieve is only available on nodes of the slk, compute, shared and interactive partitions on Levante. If you wish to use slk retrieve interactively, please start an interactive batch session via the interactive partition with salloc (Data Processing on Levante).

slk retrieve /arch/bm0146/k204221/my_data/nc3.nc /work/bm0146/k204221/results

Running many parallel retrievals is inefficient when data have to be read from tape. Instead, we suggest to perform a search and then retrieve the search results as suggested in this example script. This will decrease the overall time needed for the retrieval because the files are read more efficiently from tapes. When the retrieved files are read from the HSM-Cache instead from tape, there won’t be a high performance decline when many small retrievals are done.

Note

We recommend using striping on Levante for the time being. Some folders are striped already. Please check in advance.

The permissions of all retrieved files are rw------- and have to be adapted manually. Using umask and setfacl in advance does not work.

Move, rename and delete files or namespaces

Move a file from one namespace to another. The file’s name cannot be changed by this command.

$ slk move /arch/bm0146/k204221/my_data/nc_k_2.nc /arch/bm0146/k204221/old_data

Rename a file. The file’s location cannot be changed by this command.

$ slk rename /arch/bm0146/k204221/my_data/nc_k_3.nc a_netcdf_file.nc

Delete a file. If we apply slk delete onto a namespace then it deletes all files in this namespace without confirmation. Files in sub-namespaces are not deleted. To delete the whole namespace with all sub-, subsub-, …-namespaces and their content, please append -R.

$ slk delete /arch/bm0146/k204221/my_data/zonk.nc

Search files

Files stored in StrongLink can be searched via search queries, which are written in JSON. You can use jq to print the search queries in a human-readable way. Details on the format of search queries are given in Metadata in StrongLink.

The slk provides search functionality via the command slk search. Currently, slk search is deactivated due to an internal issue. Please only use slk_helpers search_limited for now. slk_helpers search_limited fails if more than 1000 files are found (details on the slk_helpers manual page). Both search commands return a search ID that can be used with slk list and slk retrieve to list and retrieve, respectively, the found data. Below you will find three example tasks for searches. More examples and explanations are given here: Metadata in StrongLink.

Task: find all *.nc files in the namespace /ex/am/ple/data

slk_helpers search_limited '{"$and": [{"path": {"$gte": "/ex/am/ple/data"}}, {"resources.name": {"$regex": ".nc$"}}]}'

Task: find all netCDF files, which have a global attribute project with the value ABC. For netcdf.Project below please note that netcdf is written lower case and Project starts with an upper case P although the global attribute might have been written with lower case p.

slk_helpers search_limited '{"netcdf.Project": "ABC"}'

Task: find all INDEX.txt files that either belong to user k204221 (uid: 25301) in /arch or that are stored in the namespace /double/bm0146.

slk_helpers search_limited '{"$and": [{"resources.name": "INDEX.txt"}, {"$or": [{"$and": [{"resources.posix_uid": 25301}, {"path": {"$gte": "/arch"}}]}, {"path": {"$gte": "/double/bm0146"}}]}]}'

The slk_helpers provide the command gen_file_query with which simple file queries can be generated. The query for the first task could be generated as follows:

slk_helpers gen_file_query '/ex/am/ple/data/.nc$'

Debugging

Please have a look into the slk log file in ~/.slk/slk-cli.log for detailed error messages. If you send error reports or questions on failed slk calls to the DKRZ support, please attach your slk log.

packems

Packems was adapted to slk and the new HSM system. Please have a look into the packems manual for details and usage of packems: https://code.mpimet.mpg.de/projects/esmenv/wiki/Packems.

pyslk

Most slk and slk_helpers commands are available as functions pyslk.pyslk.slk_COMMAND(...). These functions are simple wrappers that print out the text, which the slk/slk_helpers commands normally print to the command line. A bit more advanced wrappers are available for a few commands via pyslk.parsers.slk_COMMAND_.... pyslk is installed on Levante in the latest python3 module. It can also be downloaded separately – however, needs slk to be installed. Details on the availability a listed here: pyslk availability.

A few usage examples:

> from pyslk import pyslk as pslk
> from pyslk import parsers as psr
> pslk.slk_list('/arch')
drwxrwxr-x- 7003        1001                   06 Aug 2021  aa0049
drwxrwxr-x- 7003        1151                   06 Aug 2021  aa0238
drwxrwx-w-- 7003        1079                   06 Aug 2021  ab0036
drwxrwx-w-- 7003        1007                   06 Aug 2021  ab0051
...
> pslk.slk_version()
SCLI Version 3.3.21
> psr.slk_list_formatted('/arch')
    permissions    owner   group  size  day  month  year  filename
0   drwxrwxr-x-  a270003  aa0049         06   Aug   2021    aa0049
1   drwxrwxr-x-  a270003  aa0238         06   Aug   2021    aa0238
2   drwxrwx-w--  a270003  ab0036         06   Aug   2021    ab0036
3   drwxrwx-w--  a270003  ab0051         06   Aug   2021    ab0051
4   ...
>