file version: 11 November 2021
You are welcome to join our weekly Q&A session each Thursday 11:30 - 12:30 AM: https://global.gotomeeting.com/join/681975669
The DKRZ operates a hierarchal storage management system (HSM) used for the storage of all relevant data created and post processed on DKRZ systems. The hardware of the HSM consists of a disk cache and two tape libraries. The primary tape archive is located in the DKRZ building in Hamburg. Selected files are mirrored to the secondary tape archive located at the Max Planck Computing and Data Facility (MPCDF) in Garching. The software installed to operate the HSM is StrongLink. All command-line based user-interaction with the tape archive goes through StrongLink and its command line tool
If you have questions, which are not answered on this page or on the page linked in Further reading, please have a look into our FAQ. If you do not find an answer there, please contact us via email@example.com .
Storage options and quota¶
The amount of data that can be stored in the tape archive per project is limited by the available storage quota of that project. Individual users do not have a quota. Storage space on the HSM is applied for in conjunction with the (bi-)annual application for DKRZ compute and storage resources. There is normal tape archive quota denoted as
arch and quota for long term archival denoted as
docu. Additionally, users might select very important files to be stored twice, i.e. one copy in Hamburg and one copy in Garching. The following table provides an overview:
How to achieve this
Past location (HPSS)
New location (StrongLink)
single copy on tape
1 year after expiration of DKRZ project
default storage type
second copy on separate tape
1 year after expiration of project
store data in specific root namespace (see right column)
long-term storage for reference purpose
10 years after expiration of project
Please contact firstname.lastname@example.org
The output of
slk list contains an extra column which indicates whether a file meant for duplication has already been copied to Garching.
slk: command line tool for tape access¶
slk stores a login token in the home directory of each user (
~/.slk/config.json). The login token is valid for 30 days. By default, this file can only accessed by the respective user (permissions:
600). However, users should be careful when doing things like
chmod 755 * in their home directory. If you assume that your slk login token has been compromized please contact email@example.com .
The StrongLink software comes with a command line tool suite
slk is the user interface to the StrongLink software and allows the user to interact with the HSM. The available commands are:
help: displays the
version: print the
login: log in to the system with LDAP credentials
archive: copy files to the HSM
chmod: modify permissions of archived files (same as
chmodon the Linux shell)
delete: delete a namespace (and all child objects for the namespace) or a specific file
group: change group ownership of archived files; for file owners and admins only
owner: change ownership of archived files; for admins only
tag: modify metadata of archived files
search: search archived files based on metadata; deactivated
list: list searched files and some of their metadata (similar to
lson the Linux shell)
retrieve: retrieve files based on search result or based on absolute path; deactivated
recall: recall files based on search result or based on absolute path (needed for External Access only)
move: move a file or a namespace from one namespace to another namespace (might be merged with
slk renamein future)
rename: rename a file or a namespace (might be merged with
slk movein future)
StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.
After logging on to the system,
slk does not provide its own shell, but the user still navigates through the local file system, i.e. the parallel file system of mistral.
slk therefore behaves more like a
cp command on the Linux shell. It is also not possible to navigate through the emulated directory structure of the HSM using
Please read Known Issues before you start using the slk the first time. Please have a look into StrongLink Command Line Interface (slk) (on doc.dkrz.de) or into the
StrongLink Command Line Interface Guide v3.1 for a detailed description of the individual commands. Alternatively, the sections Switching: pftp to slk and slk Usage Examples contain several usage examples.
slk helpers: add-on to slk¶
slk is lacking a few minor but very useful features. The
slk_helpers program adds these features. Its usage is very similar to the usage of
slk. The following commands are available:
checksum: prints one or all checksums of a resource
exists: check if a resource exists; the resource id is returned
help: print help / usage information
hostname: prints the hostname to which slk is currently connect to or to which slk will connect to
mkdir: create a namespace in an already existing namespace (like
mkdiron the Linux shell)
metadata: get metadata of a resource
resourcepath: get path for a resource id
session: prints until when the current slk session is valid
size: returns file size in byte
Please have a look into slk helpers (slk_helpers) (on doc.dkrz.de) for a detailed description of the individual commands. Alternatively, the section slk Usage Examples contains several usage examples.
pyslk: python slk wrapper¶
We offer a python Wrapper package for the
slk and the
slk_helpers. Most commands of these two command line interfaces have corresponding wrappers in
pyslk. Usage examples of
pyslk are shown in the section pyslk in Getting Started with slk. Additionally, a pyslk API reference is provided. Feel free to download the package from pyslk downloads.
External access via sftp¶
Currently, no alternativ to access via mistral is offered for the data access.
See the extra page on External Access.
Metadata: harvesting, manual manipulation and search¶
The StrongLink software reads and extracts extended file metadata from the headers of archived netCDF files. Users may edit some of these metadata and add further metadata via
slk tag. The metadata which are saved are described in detail on our metadata manual page (Metadata in StrongLink). Additionally, extended metadata for some common file formats such as jpeg are extracted. Harvesting extended metadata from additional file formats used in the context of Earth System modeling is planned to be implemented.
Files can be searched and found based on their metadata. The StrongLink software provides the command line tools
slk list and
slk retrieve to search, list and retrieve files based on their metadata, respectively. Retrieval of files based on their absolute path in the HSM is also possible. Please see the sections Metadata in StrongLink, slk Usage Examples and StrongLink Command Line Interface (slk) for details on the usage of
slk in this context.
Packing of data¶
The tape archive delivers its best performance if the files to be archived are sufficiently large. The recommendations on packing developed on the basis of HPSS therefore remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file.
packems, which was developed by MPI-M and DKRZ for the HPSS, has been adapted to the new HSM system. The process of packing & archiving of multiple data files to tape and their retrieval is simplified by this package. It consists of three command line programs:
a pack-&-archive tools
a list-archived-content tool
a retrieve-&-unpack tool
module load packems on mistral to load the packems package. For details on the usage of
packems please have a look into the packems manual.
Backend data handling¶
Just like with the previous HSM system HPSS, the fast disk cache is installed upstream of the tape system. Files selected for archival are first copied to the disk cache and then successively written onto tape. Files selected for retrieval are first copied from tape to the cache and then copied to the specified target locations. The retrieval of files that are still/already stored in the disk cache is considerably faster then the retrieval of files that are located on tape only.
The distribution of the files in the disk cache, primary tape archive and secondary tape archive is automatically controlled by the software StrongLink. The users have no control regarding the storage location of their data.
t is appended to the permissions string of each file in the output of
slk list. The
- indicates that the file is stored in the cache.
- Getting Started with slk
- Load slk module
- On which nodes to run slk
- Archive a file or directory
- List content of a namespace
- Retrieve a file or namespace
- Move, rename and delete files or namespaces
- Search files
- further content
- slk usage examples
- Obtain Access Token
- Check if access token is still valid
- Search files
- search files by owner / user (search is deactivated)
- search files larger than 1 MB (search is deactivated)
- search file with specific value in field of optional metadata schema (search is deactivated)
- search a file by name (search is deactivated)
- search files by name using regular expressions (search is deactivated)
- search files by one of two owners – logical OR (search is deactivated)
- search files based on two metadata fields – logical AND (search is deactivated)
- search files with specific metadata in a folder recursively (search is deactivated)
- search all files that follow the CMIP Conventions (search is deactivated)
- search files with specific metadata in a specific folder without recursion (search is deactivated)
- save search ID into shell variable (search is deactivated)
- using shell variables in searches (search is deactivated)
- List files
- Retrieve files
- tag files (set metadata)
- Change permissions and group of files and directories
- Get user/group IDs and names
- slk in batch jobs on compute nodes
- manual pages slk
- slk: official StrongLink Command Line Interface
- slk helpers: slk extension provided by DKRZ
- Reference: metadata schemata
- Reference: StrongLink query language
- External Access
- General information about the HSM system
- What does HSM mean?
- What type of HSM system is used at DKRZ?
- Does the tape archive hardware also change?
- Why did DKRZ get a new system?
- What are the main differences compared to the old system?
- Is the new HSM system accessbile via pftp?
- Would it be possible/desirable to use only one command for slk and slk_helpers main classes?
- Data Migration
- When did the new HSM system go online?
- When did the HPSS go offline?
- Are my archived data available on the new system?
- How do I find out whether I have data from DXUL that have to be copied manually?
- How do I access DXUL data after the HPSS is shut down?
- How to proceed if I still have DXUL data that need to be kept?
- Training, Questions and Adaption of Workflows
- Has there been an introduction session to the new HSM system and will there be such sessions in future?
- Where can I find written documentation about the new HSM system?
- Why is no exact time schedule for training and migration published yet?
- Who do I contact when I have questions or issues regarding the new HSM system and its usage?
- Archiving and Retrieval
- How do I interact with the new system?
- Where can I use
- Can I still use pftp to interact with the new HSM system?
- How do I login to the HSM system?
- Does Kerberos authentication work on the new HSM system?
- Do I have to provide my login credentials each time I use the command line tool?
- Can I use the command line tool non-interactively?
- Can I access archived data from outside the DKRZ?
- Do I have write access to the archive from outside the DKRZ?
- Will the new system be available as a Globus endpoint for external transfers?
- Does the tape quota (/arch, /doku), which was assigned to my computing time project, remain unchanged?
- What happens if I archive a soft link?
- How do I create directories in the HSM?
- Does StrongLink automatically check the integrity of archived and retrieved files?
- Do I manually need to check the integrity of archived and retrieved files?
- Is there an option to continue archiving if it was interrupted?
- Does any command exist for deleting files immediately from /work in case of successful archival?
- Is it possible to archive into my existing folder structure created on HPSS?
- Is there a “double” storage feature as for HPSS?
- What does “namespace”, “global namespace” or “gns” mean?
- How do I automatically/non-interactively check whether I own a valid slk login token?
- Is my slk login token still valid?
- How to I check for how long my login token is still valid?
- Can I provide a file list to “slk archive” such as “-T” for “tar”?
- Can a user run multiple archival and retrieval requests at a time?
- Where on mistral should I run slk?
- How does slk archive the files: does it tar them itself (similar to packems) or should we tar the files before hand?
- Are there requirements on the file size for the tape archival?
- Additional features
- Which new features does the HSM System provide?
- From which file types are extended metadata harvested?
- Which metadata fields are harvested from netCDF files?
- Is there a python interface available?
- Is it possible to use slk chmod and slk group (=chgrp) commands recursively by the user?
- Are the search IDs user specific?
- How long are the search IDs stored?
- Is a search ID automatically updated when new files are archived which match the original search query?
- Can I share my search’s search ID with other DKRZ users?
- What does “RQL” mean?
- What is the “StrongLink Query Language”?
- Is there any possibility to move around in the filesystem with something like the cd command?
- When slk list shows a file with “-” (not “t”) which means it exists at the cache: Does that mean it is not yet on the tape?
- For a better overview of the archived files, Is there a possibility to list only folders, not all files?
- Is it possible to remove files from the archive?
- How to print the version of slk?
- How to search non-recursively in a namespace?
- Advanced Technical Aspects
- Can a user influence if data is written into the HSM cache or onto tape?
- How much time does a file stay on the cache?
- How fast can be read from the HSM?
- Do GIGA-files still exists in StrongLink?
- How do I determine the id (uid) of a DKRZ user?
- How do I determine the id (gid) of a DKRZ group?
- How do I determine the username of a DKRZ user when I have her/his id (uid)?
- How do I determine the group name of a DKRZ group when I have its id (gid)?
- How do I determine the MIME type of a file?
- Can the search ID of slk search be captured by a shell variable?
- Is the metadata of files within zip/tar files evaluated/ingested?
- Is it possible to create symlinks between lustre_path/files and tape_path/files?
- Does the packems package work with the new HSM system?
- Is it possible to use listems to list files that were archived with packems on the HPSS?
- Is it possible to use unpackems to retrieve files that were archived with packems on the HPSS?
- Can you work directly with files in the archive (e.g. with Python)?
- Terminal cursor disapears after stopping a slk command. How to get it back?
- Common issues
- v1.30, 06 December 2021
- v1.29, 18 November 2021
- v1.28, 12 November 2021
- v1.27, 11 November 2021
- v1.26, 01 November 2021
- v1.25, 27 October 2021
- v1.24, 23 October 2021
- v1.23, 15 October 2021
- v1.22, 08 October 2021
- v1.21, 01 October 2021
- v1.20, 29 September 2021
- v1.19, 20 September 2021
- v1.18, 17 September 2021
- v1.17, 17 September 2021
- v1.16, 17 August 2021
- v1.15, 30 July 2021
- v1.14, 12 July 2021
- v1.13, 29 June 2021
- v1.12, 08 June 2021
- v1.11, 06 May 2021
- v1.10, 23 April 2021
- v1.09, 06 April 2021
- v1.08, 12 March 2021
- v1.07, 10 March 2021
- v1.06, 08 March 2021
- v1.05, 23 February 2021
- v1.04, 22 February 2021
- v1.03, 18 February 2021
- v1.02, 12 February 2021
- v1.01, 28 January 2021
- General information about the HSM system