FAQ

v1.29, 18 November 2021

General information about the HSM system

What does HSM mean?

Hierarchical Storage Management. It means the DKRZ tape archive.

What type of HSM system is used at DKRZ?

The software is called StrongLink and it is developed and supplied by StrongBox Data Solutions (https://www.strongboxdata.com/stronglink).

Does the tape archive hardware also change?

The hardware of the tape archive remains unchanged. The metadata servers and disk cache were replaced by new powerful servers.

Why did DKRZ get a new system?

The contract for the previous HSM system HPSS ends in 2021 and it was also not laid out to cope with the data volumes expected to be produced by DKRZ’s upcoming HPC system Levante. Therefore DKRZ ran a tender to purchase a new HSM system. The company Cristie Data was chosen to deliver the new HSM system called StrongLink. The new system allows for a higher data throughput to and from tape and is able to cope with larger volumes of data expected to be produced on Levante.

What are the main differences compared to the old system?

From a users’ perspecitve, a different command line tool to transfer data in and out of the tape archive has to be used. The new tool provides powerful metadata features. For details have a look at “Which new features does the HSM System provide?”. Technically, the new HSM system StrongLink allows for higher data throughput, is more scalable and is more resilient towards hardware failures compared to the current/old HPSS system.

Is the new HSM system accessbile via pftp?

No. pftp was replaced by a new command line tool slk. Additionally, a command line tool slk_helpers is provided which has some features that slk is lacking. slk is developed and maintained by StrongBox Data Solutions, whereas slk_helpers are developed and maintained at DKRZ.

Would it be possible/desirable to use only one command for slk and slk_helpers main classes?

Basically: yes. We decided otherwise because the development for slk is still ongoing and we do not want to clash with the development by StrongBox: features that we implement now, might later be implemented with slight functional changes by StrongBox.

Data Migration

When did the new HSM system go online?

The new HSM system went online on 1 November 2021 at 2 PM.

When did the HPSS go offline?

The HPSS was taken offline on Friday 15 Oct 2021 at 8 AM and planned to remain offline.

Are my archived data available on the new system?

All data were migrated automatically from HPSS to StrongLink. An exception are files which originate from the DXUL (Disc EXtended Unix Linux) system and were located on HPSS in the directory /dxul and below. These files will not be migrated.

How do I find out whether I have data from DXUL that have to be copied manually?

If you own or use legacy data created on HLRE-1 (hurrikan) and earlier (before 2010), please check if there are data in /dxul and below. If you are working with data produced on HLRE-2 (blizzard) and HLRE-3 (mistral) then you are probably not affected. However, there might be config or forcing files of more recent simulations still located in /dxul.

How do I access DXUL data after the HPSS is shut down?

It is no longer possible to access the DXUL data via the new HSM system.

How to proceed if I still have DXUL data that need to be kept?

Please contact beratung@dkrz.de if you still need data from DXUL.

Training, Questions and Adaption of Workflows

Has there been an introduction session to the new HSM system and will there be such sessions in future?

Yes, a DKRZ Tech Talk took place on 6 July 2021. The new HSM system and the new command line tool slk were presented there. A recording of the Tech Talk available on YouTube: https://www.youtube.com/watch?v=JtmelPQ3ypw. We offer a HSM Q&A session each Thursday 11:30 - 12:30 AM from 30 Sept onwards (meeting URL: https://global.gotomeeting.com/join/681975669). We plan a second TechTalk when the system is fully functional.

Where can I find written documentation about the new HSM system?

The user documentation is available at https://docs.dkrz.de.

Why is no exact time schedule for training and migration published yet?

A TechTalk which gave a broad overview over the StrongLink system and over slk took place on 6 July 2021 (https://youtu.be/JtmelPQ3ypw). The HPSS was taken offline on 15 Oct (user access deactivated at 8 AM). StrongLink went online on 1 November 2 PM. We offer a HSM Q&A session each Thursday 11:30 - 12:30 AM from 30 Sept onwards (meeting URL: https://global.gotomeeting.com/join/681975669).

Who do I contact when I have questions or issues regarding the new HSM system and its usage?

Please contact us via support@dkrz.de or join our HSM-Q&A session each Thursday from 11:30 to 12:30 AM starting from 30 Sept onwards (meeting URL: https://global.gotomeeting.com/join/681975669).

Archiving and Retrieval

How do I interact with the new system?

The command line tool for tape access is called slk. Some features are deactivated for now. Additionally, the slk is missing a few small but very useful features. Therefore, a tool called slk_helpers was written at DKRZ to add these features. Details on these two tools are provided in the HSM Documentation at https://docs.dkrz.de .

Where can I use slk and slk_helpers?

slk/slk_helpers are be installed as module slk on all mistral nodes. On the login node (mlogin100 to mlogin108), please use slk archive and slk retrieve only for a few small files in MB-size or below. For details on further restrictions and recommendations please have a look into the documentation (Where and how to use slk).

Can I still use pftp to interact with the new HSM system?

No. A new command line tool is provided (please see “How do I interact with the new system?”).

How do I login to the HSM system?

Login is done via the command line tool using your DKRZ credentials (LDAP; like mistral/luv/…). The command line tool stores a login token for a specific time period (currently 30 days) so that you do not even need to go through the process of logging in for that period.

Does Kerberos authentication work on the new HSM system?

No, Kerberos does not work anymore. You only need to provide your login data to the command line tool in certain time intervals.

Do I have to provide my login credentials each time I use the command line tool?

No, a login token is generated at first login. This token is valid for a fixed period of time (currently 30 days) and will then have to be renewed by performing a login operation. You do not need to wait for 30 days. But, the login token can be renewed at any time you wish .

Can I use the command line tool non-interactively?

Yes, it can be used non-interactively when a login token exists. From time to time an interactive session of the command line tool is necessary in order to renew the login token. The command line tool returns proper exit codes so that the success or failure of a program call can automatically be evaluated.

Can I access archived data from outside the DKRZ?

Currently, data in tape archive can only be accessed via mistral. New interfaces for access from outside of the DKRZ infrastructure are planned.

Do I have write access to the archive from outside the DKRZ?

No. slk is not made for data transfer via the internet. An exception are institutions who have direct network connection to DKRZ and currently have access to the HPSS via pftp. Users from these institutions will be able to use slk.

Will the new system be available as a Globus endpoint for external transfers?

No, not at the moment and not in the near future.

Does the tape quota (/arch, /doku), which was assigned to my computing time project, remain unchanged?

Yes, your tape quota remains the same.

How do I create directories in the HSM?

A slk mkdir does not exist but the slk_helpers provide it. Use slk_helpers mkdir /ex/am/ple/dir if /ex/am/ple already exists and you only want to create dir. If you want to create several nested folders (like mkdir -p does) please use slk_helpers mkdirs /ex/am/ple/dir. If you do not want to use the slk_helpers and only slk, please do as follows: create empty directories locally, fill them with non-empty dummy files and archive them via slk archive -R. An example for this process is given in the Use Case section of the HSM documentation

Do I manually need to check the integrity of archived and retrieved files?

StrongLink does not automatically check the integrity of archived and retrieved files. I does checksum validation when files are copied from one internal location to another. You can do a manual checksum comparison as described in the answer to “Does StrongLink automatically check the integrity of archived and retrieved files?

Is there an option to continue archiving if it was interrupted?

If the archival of several files was interrupted, the slk archive will not upload files a second time after its restart but only those files that are not already present in the target folder. If only a part of the file was uploaded at the time of interruption, then the upload of the whole file will be restarted when the archival process is restarted. After stopping an archival process a file fragment will remain in the archive. Either the archival process has to be resumed (the fragment will be overwritten then) or the file has to be deleted manually. File fragments have basic metadata attached to them but no checksums. When you can get a checksum for a particular file from StrongLink (slk_helpers checksum ...) then the file was archived successfully.

Does any command exist for deleting files immediately from /work in case of successful archival?

No, such a tool does not exist. We currently do not plan to provide such a tool.

Is it possible to archive into my existing folder structure created on HPSS?

Yes, the folder structure and write permissions remained untouched. Except the root folder /hpss was dropped.

Is there a “double” storage feature as for HPSS?

Yes, there is a “double” storage feature. In future, slk list will indicate via an additional column whether a duplication has taken place already. Please see the chapter “Storage options and quota” in the new HSM documentation for details.

What does “namespace”, “global namespace” or “gns” mean?

StrongLink uses the term “namespace” or “global namespace” (=”gns”). A “(global) namespace” is comparable to a “directory” or “path” on a common file system.

How do I automatically/non-interactively check whether I own a valid slk login token?

slk does not provide a command that returns the status of the login tokes as true/false, valid/invalid or similar does not exist yet. But, you can check the validity of your login token via slk_helpers session. If you do not want to use the slk_helpers but check the status of the login token anyway, please use one of the following two commands:
# command 1:
$ slk list /dummy_input < /dev/null > /dev/null 2>&1

# command 2:
$ test `jq .expireDate ~/.slk/config.json` -gt `date +%s`

$? will be 0 if login token is valid and 1 if not. Thanks to Karl-Hermann Wieners for the first command.

You need to have the program jq available for the second command. jq is installed in /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/jq. You might add /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/ to your PATH or set an alias.

Is my slk login token still valid?

How to I check for how long my login token is still valid?

The simplest way to do this is to call slk_helpers session. Alternatively, the date/time until when the login token is stored in the slk config file (~/.slk/config.json). The key is expirationDate. The expiration date is given in seconds since 1970-01-01 00:00:00 UTC. You can convert it into a human-readable form via date -d @SECONDS. You might open the config file with a text editor or print its content with tools like cat or less.
date -d @`jq .expireDate ~/.slk/config.json`

You need to have the program jq available. jq is installed in /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/jq. You might add /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/ to your PATH or set an alias.

Can I provide a file list to “slk archive” such as “-T” for “tar”?

Currently, this is not possible.

Can a user run multiple archival and retrieval requests at a time?

Yes, that is possible. However, on interactive nodes (mlogin10X, mistralppY) or in interactive sessions (via salloc) we suggest to run only one slk call per user and node to avoid memory issues. For details please see Where and how to use slk in the documentation.

Where on mistral should I run slk?

slk uses much CPU time and memory. Therefore, slk archive and slk retrieve `` should only be used for small amounts of data on the login nodes (``mlogin10X). For large amounts of data, we suggest to use the compute/compute2 nodes or the interactive mistralpp nodes. For details please see Where and how to use slk in the documentation.

How does slk archive the files: does it tar them itself (similar to packems) or should we tar the files before hand?

slk does not packs/tar files. Metadata from netCDF files is automatically imported into the StrongLink database to simplify search and retrieval later on. Direct archiveal of nc-files is preferable with respect to the metadata import feature. However, many small files are bad for tape performance and might cost additional storage space (see for details StrongLink HSM -> Packing of data. Therefore, the usage of packems is reasonable in the case of large amount of very small files.

Are there requirements on the file size for the tape archival?

Preferred file size: 10 GB to 100 GB. Please use striping to retrieve files larger than 50 GB. Lower size limit: small files are not optimal for tape storage. Therefore, we encourage users to pack small files if there is no need to use the netCDF metadata features of StrongLink. Upper size limit: We are testing bigger file sizes right now, but for the first weeks we recommend the same sizes as for HPSS (max. 500 GB). However, this might change soon.

Additional features

Which new features does the HSM System provide?

Extended metadata is harvested from many archived files. In future, individual files can be searched for by slk search based on these extended metadata. Currently, slk search is deactivated. A more user-friendly command line tool is planned to be made available in future.

From which file types are extended metadata harvested?

Harvesting from netCDF files is implemented in StrongLink. Further formats are being investigated and could be introduced later after functional issues of slk are fixed.

Which metadata fields are harvested from netCDF files?

Most global attributes and variable names of netCDF files are stored in a metadata database. It is possible to search for each of these global attributes. Hence, properly self-described and standardized files are easier to find later on. These metadata are read-only. Metadata from a standardized subset of global attributes are be copied into an indexed metadata database. These can be modified and searched more efficiently. Please see the DKRZ documentation page on metadata schemata for details.

Is there a python interface available?

Yes, we offer a python wrapper package. See for details: https://hsm-tools.gitlab-pages.dkrz.de/pyslk

Is it possible to use slk chmod and slk group (=chgrp) commands recursively by the user?

Yes, it is possible. Please provide -R to apply these commands recursively. slk group is currently deactivated.

Are the search IDs user specific?

No, the search IDs are assigned globally. E.g. the search ID 423 exists only once. Each search ID can be used by every user. Thus, you can share your search IDs with your colleagues. However, the output of slk list SEARCH_ID or retrieval of slk retrieve SEARCH_ID ... depends on the read permissions of the executing user.

How long are the search IDs stored?

This is not decided yet. This is configurable by the administrators. We will monitor whether (or when) a performance degradation takes place and act accordingly.

Is a search ID automatically updated when new files are archived which match the original search query?

No, the IDs of files matching the search query are stored once when the search is performed. This list of these file IDs will not be updated afterwards – except if files on the list are deleted. However, file specific metadata, such as file size or permission, are retrieved at the time when the search ID is used. slk list SEARCH_ID will show todays sizes of files covered by the search ID SEARCH_ID. Files that first matched the a search query are still listed by slk list even if they no longer match the original search query. This might happen if a file is renamed.

Can I share my search’s search ID with other DKRZ users?

Yes, you can. Please see “Are the search IDs user specific?” for details.

What does “RQL” mean?

RQL abbreviates “recource query language” and is another name for the “StrongLink Query Language”.

Is there any possibility to move around in the filesystem with something like the cd command?

No, this is not possible. The slk does not start it own shell like pftp or pure ftp do. It rather works like scp.

When slk list shows a file with “-” (not “t”) which means it exists at the cache: Does that mean it is not yet on the tape?

Right now it means that the file is in the cache. It can be on the tape. If the t is shown, it means the file is only on tape - we are trying to show the duality at some point.

For a better overview of the archived files, Is there a possibility to list only folders, not all files?

When you use slk list with a specific directory path, it shows all the files and directories in that specific directory that is listed. If you use the -R flag, it shows all the files and folders in that directory path. So if you want a clean overview, excluding -R would be the way. You might use slk list GNS_PATH | grep -E "^d" to print only folders.

Is it possible to remove files from the archive?

Yes you can use slk delete for removing files and slk delete -R for removing directories.

How to print the version of slk?

Please run slk version to print the version of slk. A --version flag or similar does not exist.

Advanced Technical Aspects

Can a user influence if data is written into the HSM cache or onto tape?

No. Fresh data (meant for archival) is first copied into the disc cache and then slowly written onto tape. When data is retrieved from tape, it is first copied into the disc cache and from there to the user-defined target file system.

How much time does a file stay on the cache?

We canno’t give any numbers. The residence time in cache depends on the size of the files and the usage of the cache. We run clean up jobs regularly and monitor how fast the cache is filled.

How fast can be read from the HSM?

The target transfer rate between single nodes on mistral and the HSM cache is 1 GB/s. It might be reduced when the traffic is high. The retrieval rate from tape considerably depends on how many other read and write operations of other users are performed in paralle

How do I determine the id (uid) of a DKRZ user?

Please use one of the following commands:
# get your user id
$ id -u

# get the id of any user
$ id USER_NAME -u

# get the id of any user
$ getent passwd USER_NAME
#  OR
$ getent passwd USER_NAME | awk -F: '{ print $3 }'

How do I determine the id (gid) of a DKRZ group?

Please use one of the following commands:
# get group ID and group members
$ getent group GROUP_NAME
#  OR
$ getent group GROUP_NAME | awk -F: '{ print $3 }'

# get groups and their ids of all groups of which member you are
$ id

How do I determine the username of a DKRZ user when I have her/his id (uid)?

Please use the following command:
# get the name of a user with uid USER_ID
$ getent passwd USER_ID
#  OR
$ getent passwd USER_ID | awk -F: '{ print $1 }'

How do I determine the group name of a DKRZ group when I have its id (gid)?

Please use one of the following commands:
# get group name of a groupd with gid GROUP_ID
$ getent group GROUP_ID
#  OR
$ getent group GROUP_ID | awk -F: '{ print $1 }'

How do I determine the MIME type of a file?

You could use file --mime-type FILE or file -b --mime-type FILE to determine the MIME type on the Linux shell. Please be aware that different tools determine the MIME type differently (i.e. by file header or by file extension) and MIME type databases might differ. It might be better not to search for a specific MIME type but for a particular file extension – e.g. via {"resources.name": {"$regex": ".*nc$"}}. StrongLink allocates the MIME type application/x-netcdf to netCDF files.

Can the search ID of slk search be captured by a shell variable?

slk search (currently deactivated) does not provide this feature out of the box. Currently (might change in future versions), the search ID is printed in columns >= 12 of the second row of the text output of slk search. We can use tail and sed to get the second line and extract a number or use tail and cut to get the second line and drop the first 11 characters. Example:
# normal call of slk search
$ slk search '{"resources.posix_uid": 23501}'
Total resources found: 11. Search complete.
Search ID: 466

# get ID using sed:
$ search_id=`slk search '{"resources.posix_uid": 23501}' | tail -n 1 | sed 's/[^0-9]*//g'`
$ echo $search_id
470

# get ID by dropping first 11 characters of the second line
$ search_id=`slk search '{"resources.posix_uid": 23501}' | tail -n 1 | cut -c12-20`
$ echo $search_id
471

# use awk pattern matching to get the correct line and correct column
$ search_id=`slk search '{"resources.posix_uid": 25301}' | awk '/Search ID/ {print($3)}'`
$ echo $search_id
507

Note

This is an example for bash. When using csh, you need to prepend set `` in front of the assignments of the shell variables: ``set search_id=....

Is the metadata of files within zip/tar files evaluated/ingested?

No, the metadata of packed files is not ingested.

Does the packems package work with the new HSM system?

Yes, packem has been adapted to the new HSM system in coorperation with the MPI-M. Please have a look into the packems manual for details and usage of packems: https://code.mpimet.mpg.de/projects/esmenv/wiki/Packems.

Is it possible to use listems to list files that were archived with packems on the HPSS?

Yes. All files archived with packems onto the HPSS can be listed with listems.

Is it possible to use unpackems to retrieve files that were archived with packems on the HPSS?

Yes. All files archived with packems onto the HPSS can be retrieved with unpackems.

Can you work directly with files in the archive (e.g. with Python)?

No, you have to download files to change them and archive them again.

Common issues

error “conflict with jdk/…” when the slk module is loaded

slk needs a specific Java version that is automatically loaded with slk. Having other Java versions loaded in parallel might cause unwanted side effects. Therefore, the system throws an error message and aborts.

slk needs a specific Java version

You might encounter an error like this:

$ slk list 12
CLI tools require Java 13 (found 1)

slk needs a specific Java version. This Java version is automatically loaded when we load the slk module. If you have another Java loaded explicitely, please unload them prior to loading the slk module. If you loaded slk already, please: (1) unload slk, (2) unload all Java modules and (3) load slk again.

slk search yields RQL parse error

ERROR: Search failed. Reason: RQL parse error: No period found in collection field name ().

Either: Please consider using ' around your search query instead of " to prevent operators starting with $ to be evaluated as bash variables.

Or: Please escape $’s belongig to query operators when you use " as delimiters of the query string.

slk login asks me to provide a hostname and/or a domain

If you are asked for this information the configuration is faulty. Please contact support@dkrz.de and tell us on which machine you are working.

Session key has expired

The error message WARNING: Session key has expired. Please login again: (interactive usage) or ERROR Session key has expired, unable to login in non-interactive mode (non-interactive usage; e.g. in SLURM batch script) is printed but the session key is not yet 30 days old.

This error/warning might be printed if the connection to StrongLink is not stable or if StrongLink is overloaded. Please contact support@dkrz.de if you experiance it.

In non-interactive mode, ERROR Session key has expired, unable to login in non-interactive mode is printed in this situation.

Login Unsuccessful - Incorrect Credentials

You log to the slk with the correct login credentials but get the error Login Unsuccessful - Incorrect Credentials. There is an internal issue in StrongLink through which no new login is possible in the moment. Please contact support@dkrz.de if you experiance it.

Archival fails and Java NullPointerException in the log

This error message is printed in the log:

2021-07-13 08:33:03 ERROR Unexpected exception
java.lang.NullPointerException: null
    at com.stronglink.slkcli.api.websocket.NodeThreadPools.getBestPool(NodeThreadPools.kt:28) ~[slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.archive.Archive.upload(Archive.kt:191) ~[slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.archive.Archive.uploadResource(Archive.kt:165) ~[slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.archive.Archive.archive(Archive.kt:77) [slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.SlkCliMain.run(SlkCliMain.kt:169) [slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.SlkCliMainKt.main(SlkCliMain.kt:103) [slk-cli-tools-3.1.62.jar:?]
2021-07-13 08:33:03 INFO

This error indicates that there is an API issue. A reason might be that one or more StrongLink nodes went offline and the other nodes did not take of their connections yet. Please notify support@dkrz.de if you experiance this error.

Terminal cursor disapears after stopping a slk command

If a slk command with a progress bar is canceled by the user, the shell cursor might disappear. One can make it re-appear by (a) running reset or (b) starting vim and leaving it directly (:q!).

Changelog

v1.29, 18 November 2021

v1.28, 12 November 2021

v1.27, 11 November 2021

v1.26, 01 November 2021

v1.25, 27 October 2021

v1.19, 20 September 2021

  • changed title of FAQ

  • corrected FAQ’s Changelog

v1.18, 17 September 2021

  • added cross-references

  • minor layout changes

v1.17, 17 September 2021

v1.14, 12 July 2021

v1.11, 06 May 2021

v1.10, 23 April 2021

v1.06, 08 March 2021

v1.02, 12 February 2021

v1.01, 28 January 2021

  • first public version