FAQ

v1.34, 19 April 2022

Table of Contents

General information about the HSM system

What does HSM mean?

Hierarchical Storage Management. It means the DKRZ tape archive.

What type of HSM system is used at DKRZ?

The software is called StrongLink and it is developed and supplied by StrongBox Data Solutions (https://www.strongboxdata.com/stronglink).

Why did DKRZ get a new system?

The contract for the previous HSM system HPSS ends in 2021 and it was also not laid out to cope with the data volumes expected to be produced by DKRZ’s upcoming HPC system Levante. Therefore DKRZ ran a tender to purchase a new HSM system. The company Cristie Data was chosen to deliver the new HSM system called StrongLink. The new system allows for a higher data throughput to and from tape and is able to cope with larger volumes of data expected to be produced on Levante.

What are the main differences compared to the old system?

From a users’ perspecitve, a different command line tool to transfer data in and out of the tape archive has to be used. The new tool provides powerful metadata features. For details have a look at “Which new features does the HSM System provide?”. Technically, the new HSM system StrongLink allows for higher data throughput, is more scalable and is more resilient towards hardware failures compared to the current/old HPSS system.

Is the new HSM system accessbile via pftp?

No. pftp was replaced by a new command line tool slk. Additionally, a command line tool slk_helpers is provided which has some features that slk is lacking. slk is developed and maintained by StrongBox Data Solutions, whereas slk_helpers are developed and maintained at DKRZ.

Would it be possible/desirable to use only one command for slk and slk_helpers main classes?

Basically: yes. We decided otherwise because the development for slk is still ongoing and we do not want to clash with the development by StrongBox: features that we implement now, might later be implemented with slight functional changes by StrongBox.

Data Migration

When did the new HSM system go online?

The new HSM system went online on 1 November 2021 at 2 PM.

Are my archived data available on the new system?

All data were migrated automatically from HPSS to StrongLink. An exception are files which originate from the DXUL (Disc EXtended Unix Linux) system and were located on HPSS in the directory /dxul and below. These files will not be migrated.

How do I find out whether I have data from DXUL that have to be copied manually?

If you own or use legacy data created on HLRE-1 (hurrikan) and earlier (before 2010), please check if there are data in /dxul and below. If you are working with data produced on HLRE-2 (blizzard), HLRE-3 (mistral) and HLRE-4 (levante) then you are probably not affected. However, there might be config or forcing files of more recent simulations still located in /dxul.

How do I access DXUL data after the HPSS is shut down?

It is no longer possible to access the DXUL data via the new HSM system.

How to proceed if I still have DXUL data that need to be kept?

Please contact beratung@dkrz.de if you still need data from DXUL.

Training, Questions and Adaption of Workflows

Has there been an introduction session to the new HSM system and will there be such sessions in future?

Yes, a DKRZ Tech Talk took place on 6 July 2021. The new HSM system and the new command line tool slk were presented there. A recording of the Tech Talk available on YouTube: https://www.youtube.com/watch?v=JtmelPQ3ypw. We offered a HSM Q&A session each Thursday 11:30 - 12:30 AM from 30 Sept to end of December 2021. We plan a second TechTalk when the system is fully functional.

Where can I find written documentation about the new HSM system?

The user documentation is available at https://docs.dkrz.de.

Why is no exact time schedule for training and migration published yet?

A TechTalk which gave a broad overview over the StrongLink system and over slk took place on 6 July 2021 (https://youtu.be/JtmelPQ3ypw). The HPSS was taken offline on 15 Oct (user access deactivated at 8 AM). StrongLink went online on 1 November 2 PM. We offered a HSM Q&A session each Thursday 11:30 - 12:30 AM from 30 Sept to end of December 2021. We plan a second TechTalk when the system is fully functional.

Who do I contact when I have questions or issues regarding the new HSM system and its usage?

Please contact us via support@dkrz.de or join our HSM-Q&A session each Thursday from 11:30 to 12:30 AM starting from 30 Sept onwards (meeting URL: https://global.gotomeeting.com/join/681975669).

Archiving and Retrieval

How do I interact with the new system?

The command line tool for tape access is called slk. Some features are deactivated for now. Additionally, the slk is missing a few small but very useful features. Therefore, a tool called slk_helpers was written at DKRZ to add these features. Details on these two tools are provided in the HSM Documentation at https://docs.dkrz.de .

Where can I use slk and slk_helpers?

slk/slk_helpers are be installed as module slk on all mistral and lenvate nodes. On the login nodes (mlogin100 to mlogin108; levante1 to levante7), please use slk archive and slk retrieve only for a few small files in MB-size or below. For details on further restrictions and recommendations please have a look into the documentation (section On which nodes to run slk on page Getting Started with slk).

Can I still use pftp to interact with the new HSM system?

No. A new command line tool is provided (please see “How do I interact with the new system?”).

How do I login to the HSM system?

Login is done via the command line tool using your DKRZ credentials (LDAP; like used for luv). The command line tool stores a login token for a specific time period (currently 30 days) so that you do not even need to go through the process of logging in for that period.

Does Kerberos authentication work on the new HSM system?

No, Kerberos does not work anymore. You only need to provide your login data to the command line tool in certain time intervals (currently 30 days).

Do I have to provide my login credentials each time I use the command line tool?

No, a login token is generated at first login. This token is valid for a fixed period of time (currently 30 days) and will then have to be renewed by performing a login operation. You do not need to wait for 30 days. But, the login token can be renewed at any time you wish .

Can I use the command line tool non-interactively?

Yes, it can be used non-interactively when a login token exists. From time to time an interactive session of the command line tool is necessary in order to renew the login token. The command line tool returns proper exit codes so that the success or failure of a program call can automatically be evaluated.

Can I access archived data from outside the DKRZ?

Currently, data in tape archive can only be accessed via mistral and levante. New interfaces for access from outside of the DKRZ infrastructure are planned.

Do I have write access to the archive from outside the DKRZ?

No. slk is not made for data transfer via the internet. An exception are institutions who have direct network connection to DKRZ and currently have access to the HPSS via pftp. Users from these institutions will be able to use slk.

Will the new system be available as a Globus endpoint for external transfers?

No, not at the moment and not in the near future.

Does the tape quota (/arch, /doku), which was assigned to my computing time project, remain unchanged?

Yes, your tape quota remains the same.

How do I create directories in the HSM?

A slk mkdir does not exist but the slk_helpers provide it. Use slk_helpers mkdir /ex/am/ple/dir if /ex/am/ple already exists and you only want to create dir. If you want to create several nested folders (like mkdir -p does) please use slk_helpers mkdir -R /ex/am/ple/dir. If you do not want to use the slk_helpers and only slk, please do as follows: create empty directories locally, fill them with non-empty dummy files and archive them via slk archive -R. An example for this process is given in the Use Case section of the HSM documentation

Do I manually need to check the integrity of archived and retrieved files?

Is there an option to continue archiving if it was interrupted?

If the archival of several files was interrupted, the slk archive will not upload files a second time after its restart but only those files that are not already present in the target folder. If only a part of the file was uploaded at the time of interruption, then the upload of the whole file will be restarted when the archival process is restarted. After stopping an archival process a file fragment will remain in the archive. Either the archival process has to be resumed (the fragment will be overwritten then) or the file has to be deleted manually. File fragments have basic metadata attached to them but no checksums. When you can get a checksum for a particular file from StrongLink (slk_helpers checksum ...) then the file was archived successfully.

Does any command exist for deleting files immediately from /work in case of successful archival?

No, such a tool does not exist. We currently do not plan to provide such a tool.

Is it possible to archive into my existing folder structure created on HPSS?

Yes, the folder structure and write permissions remained untouched. Except the root folder /hpss was dropped.

Is there a “double” storage feature as for HPSS?

Yes, there is a “double” storage feature. Please see the chapter “Storage options and quota” in the new HSM documentation for details.

What does “namespace”, “global namespace” or “gns” mean?

StrongLink uses the term “namespace” or “global namespace” (=”gns”). A “(global) namespace” is comparable to a “directory” or “path” on a common file system.

How do I automatically/non-interactively check whether I own a valid slk login token?

slk does not provide a command that returns the status of the login tokes as true/false, valid/invalid or similar does not exist yet. But, you can check the validity of your login token via slk_helpers session. If you do not want to use the slk_helpers but check the status of the login token anyway, please use one of the following two commands:
# command 1:
$ slk list /dummy_input < /dev/null > /dev/null 2>&1

# command 2:
$ test `jq .expireDate ~/.slk/config.json` -gt `date +%s`

$? will be 0 if login token is valid and 1 if not. Thanks to Karl-Hermann Wieners for the first command.

You need to have the program jq available for the second command. jq is installed in /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/jq. You might add /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/ to your PATH or set an alias.

Is my slk login token still valid?

How to I check for how long my login token is still valid?

slk_helpers session will print the expiration date. Alternatively, the date/time until when the login token is stored in the slk config file (~/.slk/config.json). The key is expirationDate. You might open the config file with a text editor or print its content with tools like cat, less or jq.
jq .expireDate ~/.slk/config.json

You need to have the program jq available. jq is installed in /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/jq on mistral and in /usr/bin/jq on levante. On mistral, you might add /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/ to your PATH or set an alias.

Can I provide a file list to “slk archive” such as “-T” for “tar”?

Currently, this is not possible.

Can a user run multiple archival and retrieval requests at a time?

Yes, that is possible. However, on interactive nodes (mlogin10X, mistralppY, levanteZ) or on shared nodes we suggest to run only one slk call per user and node to avoid memory issues. For details please see section On which nodes to run slk on page Getting Started with slk of the documentation.

Where on mistral and levante should I run slk?

slk uses much CPU time and memory. Therefore, slk archive and slk retrieve `` should only be used for small amounts of data on the login nodes (``mlogin10X, levanteY). For large amounts of data, we suggest to use the compute/compute2. For details and alternatives please see section On which nodes to run slk on page Getting Started with slk of the documentation.

How does slk archive the files: does it tar them itself (similar to packems) or should we tar the files before hand?

slk does not packs/tar files. Metadata from netCDF files is automatically imported into the StrongLink database to simplify search and retrieval later on. Direct archiveal of nc-files is preferable with respect to the metadata import feature. However, many small files are bad for tape performance and might cost additional storage space (see Storage options and quota. Therefore, the usage of packems is reasonable in the case of large amount of very small files.

Are there requirements on the file size for the tape archival?

Preferred file size: 10 GB to 100 GB. Each file smaller than 1 GB will be charged 1 GB. Lower size limit: small files are not optimal for tape storage. Therefore, we encourage users to pack small files if there is no need to use the netCDF metadata features of StrongLink. Therefore, we charge at least 1 GB per file. Upper size limit: file sizes of a few TB are possible, but we recommend the same sizes as for HPSS (max. 500 GB).

I am member of a project but cannot access this projects data?

Please login again via slk login. For details please see group memberships of user updated on login.

Why do I get “Exception …: lateinit property websocket has not been initialized”?

When running slk archive with the argument --streams N please do only use values between 1 and 4` for N. For details please see slk archive: Exception …: lateinit property websocket has not been initialized.

My slk archive seems to hang. What should I do?

Please check whether /home is hanging. If /home is hanging, slk cannot access its login token and cannot write into its log. Therefore, slk hangs when /home is hanging.

Please check whether you are archiving a file of 0 Byte size. Details on the Known Issues page.

Additional features

Which new features does the HSM System provide?

Extended metadata is harvested from many archived files. In future, individual files can be searched for by slk search based on these extended metadata. Currently, slk search is deactivated. Please use slk_helpers search_limited for the time being.

From which file types are extended metadata harvested?

Harvesting from netCDF files is implemented in StrongLink. Further formats are being investigated and could be introduced later after functional issues of slk are fixed. We don’t expect that this will happen soon.

Which metadata fields are harvested from netCDF files?

Most global attributes and variable names of netCDF files are stored in a metadata database. It is possible to search for each of these global attributes. Hence, properly self-described and standardized files are easier to find later on. These metadata are read-only. Metadata from a standardized subset of global attributes are be copied into an indexed metadata database. These can be modified and searched more efficiently. Please see the DKRZ documentation page on metadata schemata for details.

Is there a python interface available?

Yes, we offer a python wrapper package called pyslk. It is installed in python3/2021.01-gcc-9.1.0 and python3/unstable on mistral and in python3/2022.01-gcc-11.2.0 on levante. The slk module as to be loaded when pyslk is used. See for details please see https://hsm-tools.gitlab-pages.dkrz.de/pyslk .

Is it possible to use slk chmod and slk group (=chgrp) commands recursively by the user?

Yes, it is possible. Please provide -R to apply these commands recursively.

Are the search IDs user specific?

No, the search IDs are assigned globally. E.g. the search ID 423 exists only once. Each search ID can be used by every user. Thus, you can share your search IDs with your colleagues. However, the output of slk list SEARCH_ID or retrieval of slk retrieve SEARCH_ID ... depends on the read permissions of the executing user.

How long are the search IDs stored?

This is not decided yet. This is configurable by the administrators. We will monitor whether (or when) a performance degradation takes place and act accordingly.

Is a search ID automatically updated when new files are archived which match the original search query?

No, the IDs of files matching the search query are stored once when the search is performed. This list of these file IDs will not be updated afterwards – except if files on the list are deleted. However, file specific metadata, such as file size or permission, are retrieved at the time when the search ID is used. slk list SEARCH_ID will show todays sizes of files covered by the search ID SEARCH_ID. Files that first matched the a search query are still listed by slk list even if they no longer match the original search query. This might happen if a file is renamed.

Can I share my search’s search ID with other DKRZ users?

Yes, you can. Please see “Are the search IDs user specific?” for details.

What does “RQL” mean?

RQL abbreviates “recource query language” and is another name for the “StrongLink Query Language”.

Is there any possibility to move around in the filesystem with something like the cd command?

No, this is not possible. The slk does not start it own shell like pftp or pure ftp do. It rather works like scp.

When slk list shows a file with “-” (not “t”) which means it exists at the cache: Does that mean it is not yet on the tape?

Right now it means that the file is in the cache. It can be on the tape – but not necessarily. If the t is shown, it means the file is only on tape - we are trying to show the duality at some point.

For a better overview of the archived files, Is there a possibility to list only folders, not all files?

When you use slk list with a specific namespace path, it shows all the files and namespaces in that specific namespace. If you use the -R flag, it shows all the files and namespaces recursively (like ls -R). So if you want a clean overview, excluding -R would be the way. You might use slk list GNS_PATH | grep -E "^d" to print only folders.

Is it possible to remove files from the archive?

Yes you can use slk delete for removing files and slk delete -R for removing namespaces.

How to print the version of slk?

Please run slk version to print the version of slk. A --version flag or similar does not exist.

How to search non-recursively in a namespace?

slk search cannot search non-recursively in a namespace provided via path. As workaround, please get the object id of the particular namespace via slk_helpers exists and, then in your search query, use it as value for the search field resources.parent_id (see slk Usage Examples)

Is it possible to move files within the archive?

Yes you can use slk move for move a file or namespace from one namespace to another. Absolute paths have to be used: slk move /old_path/file.nc /new_path. Renaming cannot be done with slk move. I.e. this does not work: slk move /old_path/file.nc /new_path/new_file_name.nc. Please use slk rename for renaming operations.

Is it possible to rename files within the archive?

Yes you can use slk rename to rename a file or namespace. slk rename cannot be applied on multiple files/namespaces

How do I tag a folder with metadata?

Tagging folders with metadata is not possible in the moment.

How do I tag an individual file with metadata?

Tagging individual files with metadata is not possible in the moment.

slk search does not find any resources although resources exist that seem to match the query

Example command:

$ slk search '{"resources.posix_uid": "25301"}'
Search continuing. .....
Search ID: 216

Reason:

The query parser does not recognize when a wrong variable type is used. resources.posix_uid is of type integer and not string. Providing the wrong data types leads to 0 found results.

Solution:

Write 25301 (integer) instead of "25301" (string) in the search query.

slk search '{"resources.posix_uid": 25301}'
Search continuing. ..... Search continuing. .....

Search ID: 217

Error: slk search yields RQL parse error

Example command and error:

$ slk search "{\"resources.size\":{\"$gt\": 1048576}}"
ERROR: Search failed. Reason: RQL parse error: No period found in collection field name ().

Reason:

The $ in front of the gt was not escaped. Therefore, $gt is interpreted as environment variable by the shell before the query is handed to the slk. In most situations, no environment variable gt is defined leading to an empty string. If the query were surrounded by ' as delimiter and not by " then the $gt would not have been interpreted.

The above call of slk search as interpreted by the shell looks like

$ slk search "{\"resources.size\":{\"\": 1048576}}"

Solution:

Either: use ' as delimiter of your search query instead of " to prevent operators starting with $ to be evaluated by your shell

Or: escape $’s in front of query operators by \ when you use " as delimiters of the query string.

'{"resources.size":{"$gt": 1048576}}'
"{\"resources.size\":{\"\$gt\": 1048576}}"

Note

In some situations it might be very useful to use " as delimiter for your queries – e.g. if environment variables are part of your query.

$ export file_size=1048576
$ slk search "{\"resources.size\":{\"\$gt\": $file_size}}"

Advanced Technical Aspects

Can a user influence if data is written into the HSM cache or onto tape?

No. Fresh data (meant for archival) is first copied into the disc cache and then slowly written onto tape. When data is retrieved from tape, it is first copied into the disc cache and from there to the user-defined target file system.

How much time does a file stay on the cache?

We cannot give any numbers. The residence time in cache depends on the size of the files and the usage of the cache. We run clean up jobs regularly and monitor how fast the cache is filled.

How fast can be read from the HSM?

The target transfer rate between single nodes on mistral/levante and the HSM cache is 1 GB/s. It might be higher in some situations and be reduced when the traffic is high. The retrieval rate from tape considerably depends on how many other read and write operations of other users are performed in parallel. If all tape drives are in us, slk retrieve will be idle until a tape drive is free.

How do I determine the id (uid) of a DKRZ user?

Please use one of the following commands:
# get your user id
$ id -u

# get the id of any user
$ id USER_NAME -u

# get the id of any user
$ getent passwd USER_NAME
#  OR
$ getent passwd USER_NAME | awk -F: '{ print $3 }'

How do I determine the id (gid) of a DKRZ group?

Please use one of the following commands:
# get group ID and group members
$ getent group GROUP_NAME
#  OR
$ getent group GROUP_NAME | awk -F: '{ print $3 }'

# get groups and their ids of all groups of which member you are
$ id

How do I determine the username of a DKRZ user when I have her/his id (uid)?

Please use the following command:
# get the name of a user with uid USER_ID
$ getent passwd USER_ID
#  OR
$ getent passwd USER_ID | awk -F: '{ print $1 }'

How do I determine the group name of a DKRZ group when I have its id (gid)?

Please use one of the following commands:
# get group name of a groupd with gid GROUP_ID
$ getent group GROUP_ID
#  OR
$ getent group GROUP_ID | awk -F: '{ print $1 }'

How do I determine the MIME type of a file?

You could use file --mime-type FILE or file -b --mime-type FILE to determine the MIME type on the Linux shell. Please be aware that different tools determine the MIME type differently (i.e. by file header or by file extension) and MIME type databases might differ. It might be better not to search for a specific MIME type but for a particular file extension – e.g. via {"resources.name": {"$regex": ".*nc$"}}. StrongLink allocates the MIME type application/x-netcdf to netCDF files.

Can the search ID of slk search be captured by a shell variable?

slk search (currently deactivated; please use slk_helpers search_limited) does not provide this feature out of the box. Currently (might change in future versions), the search ID is printed in columns >= 12 of the second row of the text output of slk search. We can use tail and sed to get the second line and extract a number or use tail and cut to get the second line and drop the first 11 characters. Example:
# normal call of slk search
$ slk search '{"resources.posix_uid": 23501}'
Search continuing. .....
Search ID: 466

# get ID using sed:
$ search_id=`slk search '{"resources.posix_uid": 23501}' | tail -n 1 | sed 's/[^0-9]*//g'`
$ echo $search_id
470

# get ID by dropping first 11 characters of the second line
$ search_id=`slk search '{"resources.posix_uid": 23501}' | tail -n 1 | cut -c12-20`
$ echo $search_id
471

# use awk pattern matching to get the correct line and correct column
$ search_id=`slk search '{"resources.posix_uid": 25301}' | awk '/Search ID/ {print($3)}'`
$ echo $search_id
507

Note

This is an example for bash. When using csh, you need to prepend set `` in front of the assignments of the shell variables: ``set search_id=....

Is the metadata of files within zip/tar files evaluated/ingested?

No, the metadata of packed files is not ingested.

Does the packems package work with the new HSM system?

Yes, packem has been adapted to the new HSM system in coorperation with the MPI-M. The functionality of unpackems and listems is very limited in the moment because the availability of slk retrieve is still limited. Please have a look into the packems manual for details and usage of packems: https://code.mpimet.mpg.de/projects/esmenv/wiki/Packems.

Is it possible to use listems to list files that were archived with packems on the HPSS?

Yes and no. When slk retrieve is activated for interactive use, all files archived with packems onto the HPSS can be listed with listems. Currently, slk retrieve is not fully available and listems works only with INDEX.txt files stored on the lustre filesystem.

Is it possible to use unpackems to retrieve files that were archived with packems on the HPSS?

Yes and no. When slk retrieve will be fully activated, all files archived with packems onto the HPSS can be retrieved with unpackems. Currently, slk retrieve is not fully available. Therefore, individual INDEX.txt and *.tar files need to be retrieved manually.

Can you work directly with files in the archive (e.g. with Python)?

No, you have to download files to change them and archive them again.

Terminal cursor disapears after stopping a slk command. How to get it back?

If a slk command with a progress bar is canceled by the user, the shell cursor might disappear. One can make it re-appear by (a) running reset or (b) starting vim and leaving it directly (:q!).

Is a file stored in the HSM cache already or exclusively on tape?

Solution a: In the output of slk list, please check the 11th character of the first column (permissions string). If this character is t then the file is exclusively stored on tape. If it is a - then the file is available from the HSM cache.

Solution b: Use slk_helpers iscached RESOURCE_PATH to check whether a file available from the HSM cache (exit code is 0) or not (exit code is 1).

Common issues

Please see the extra page Known Issues

Changelog

v1.34, 19 April 2022

v1.33, 30 March 2022

v1.32, 28 February 2022

v1.31, 11 February 2022

v1.30, 06 December 2021

v1.29, 18 November 2021

  • modified: Session key has expired

v1.28, 12 November 2021

v1.27, 11 November 2021

v1.26, 01 November 2021

v1.25, 27 October 2021

v1.24, 23 October 2021

v1.23, 15 October 2021

v1.22, 08 October 2021

v1.21, 01 October 2021

  • new: Archival fails and Java NullPointerException in the log

v1.20, 29 September 2021

v1.19, 20 September 2021

  • changed title of FAQ

  • corrected FAQ’s Changelog

v1.18, 17 September 2021

  • added cross-references

  • minor layout changes

v1.17, 17 September 2021

v1.16, 17 August 2021

v1.15, 30 July 2021

v1.14, 12 July 2021

v1.13, 29 June 2021

v1.12, 08 June 2021

v1.11, 06 May 2021

v1.10, 23 April 2021

v1.09, 06 April 2021

v1.08, 12 March 2021

v1.07, 10 March 2021

v1.06, 08 March 2021

v1.05, 23 February 2021

v1.04, 22 February 2021

v1.03, 18 February 2021

v1.02, 12 February 2021

v1.01, 28 January 2021

  • first public version