HSM StrongLink (tape archive)#
file version: 01 Dec 2023
Introduction#
The DKRZ operates a hierarchal storage management system (HSM) used for the storage of all relevant data created and post processed on DKRZ systems. The hardware of the HSM consists of a disk cache and two tape libraries. The primary tape archive has a storage capacity of more than 300 PB and is located in the DKRZ building in Hamburg. Selected files are mirrored to the secondary tape archive located at the Max Planck Computing and Data Facility (MPCDF) in Garching. The software installed to operate the HSM is StrongLink. All command-line based user-interaction with the tape archive goes through StrongLink and its command line tool slk
.
If you have questions, which are not answered on this page or on the linked pages, please have a look into our FAQ. If you do not find an answer there, please contact us via support@dkrz.de .
Storage options, quota and file size#
The tape archive delivers its best performance if the files are sufficiently large. Therefore, we recommend to archive files in the size range from 10 GB to 500 GB. The accounting of used quota is done in increments of 1GB per archived file. You can use packems to pack small files into tar balls and create indices automatically. Archivals of files larger than 500 GB have be tested successfully.
The amount of data that can be stored in the tape archive per project is limited by the available storage quota of that project. Individual users do not have a quota. Storage space on the HSM is applied for in conjunction with the (bi-)annual application for DKRZ compute and storage resources. You can check your storage quota via https://luv.dkrz.de . There is normal tape archive quota denoted as arch
and quota for long term archival denoted as docu
. Additionally, users might select very important files to be stored twice, i.e. one copy in Hamburg and one copy in Garching. The following table provides an overview:
File Storage |
Storage Time |
Used quota |
How to achieve this |
Past location (HPSS) |
New location (StrongLink) |
---|---|---|---|---|---|
single copy on tape |
1 year after expiration of DKRZ project |
|
default storage type |
/hpss/arch/<prj> |
/arch/<prj> |
second copy on separate tape |
1 year after expiration of project |
|
store data in specific root namespace (see right column) |
/hpss/double/<prj> |
/double/<prj> |
long-term storage for reference purpose |
10 years after expiration of project |
|
Please contact data@dkrz.de |
/hpss/doku/<prj> |
/doku/<prj> |
slk: command line tool for tape access#
Note
slk
stores a login token in the home directory of each user (~/.slk/config.json
). The login token is valid for 30 days. By default, this file can only accessed by the respective user (permissions: -rw-------
/600
). However, users should be careful when doing things like chmod 755 *
in their home directory. If you assume that your slk login token has been compromised please contact support@dkrz.de .
The StrongLink software comes with a command line tool suite slk
. slk
is the user interface to the StrongLink software and allows the user to interact with the HSM. The available commands are:
help
: displays theslk
help pageversion
: print theslk
versionlogin
: log in to the system with LDAP credentialsarchive
: copy files to the HSMchmod
: modify permissions of archived files (same aschmod
on the Linux shell)delete
: delete a namespace (and all child objects for the namespace) or a specific filegroup
: change group ownership of archived files; for file owners and admins onlyowner
: change ownership of archived files; for admins onlytag
: modify metadata of archived filessearch
: search archived files based on metadatalist
: list searched files and some of their metadata (similar tols
on the Linux shell)retrieve
: retrieve files based on search result or based on absolute path; retrieves not more than 500 files; limited availability see Retrievals from taperecall
: recall files based on search result or based on absolute path (recall: copy from tape to HSM cache)move
: move a file or a namespace from one namespace to another namespace (might be merged withslk rename
in future)rename
: rename a file or a namespace (might be merged withslk move
in future)
Note
StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.
Note
slk
does not provide its own shell. slk login
simply creates a login token which allows other slk
commands to be used. It is not possible to navigate through the emulated directory structure of the HSM using a cd
-like command. Instead, each slk
commands needs the full path of resources as input.
Note
On Levante login nodes, slk retrieve
can only retrieve one file at once. There are no limitations for retrievals on nodes of the compute
, shared
and interactive
partitions. All other slk
commands are available without any limitations on all nodes. If you wish to archive or retrieve large files or many files interactively, please start an interactive batch session via the interactive
partition with salloc
(Run slk in the “interactive” partition and Data Processing on Levante).
Please read Known Issues before you start using the slk the first time. Please have a look into StrongLink Command Line Interface (slk) (on docs.dkrz.de) or into the StrongLink Command Line Interface Guide for a detailed description of the individual commands. Alternatively, the page slk Usage Examples contains several usage examples.
slk helpers: add-on to slk#
The slk
is lacking a few minor but very useful features. The slk_helpers
program adds these features. Its usage is very similar to the usage of slk
. The following commands are available:
checksum
: prints one or all checksums of a resourceexists
: check if a resource exists; the resource id is returnedhelp
: print help / usage informationgen_file_query
: generate a query to find selected filesgen_search_query
: generate a query to find files which meet the provided conditionsgroup_files_by_tape
/gfbt
: groups a list of files based on which tape they are stored onis_admin_session
: Check if the use is currently logged in as normal user or admin userjson2hsm
: import metadata from a JSON file into the HSMjob_exists
: check if a tape read job with the given ID existsjob_queue
: prints status of the queue of tape read jobsjob_report
: fetch raw verify job report; please useresult_verify_job
instead if possiblejob_status
: check the status of a tape read job with the given IDhostname
: prints the hostname to which slk is currently connect to or to which slk will connect tohsm2json
: export file metadata as JSONiscached
: checks if a file is stored in the HSM cacheis_on_tape
: checks if a file is stored on tape (output independent of the caching status)list_search
: prints paths of all resources found by search idmkdir
: create a namespace in an already existing namespace (likemkdir
on the Linux shell)metadata
: get metadata of a resourceprint_rcrs
: print size and checksums of file parts; some HPSS files are stored as two parts on two tapesresource_path
/resourcepath
: get path for a resource idresource_permissions
: get permissions for a resource path or idresource_type
: get type for a resource path or id (FILE
orNAMESPACE
)result_verify_job
: list relevant errors found by a verify jobsearch_immediately
(hidden / expert command): starts search and returns search id immediately; less error checks than inslk search
(strange/no error messages)search_incomplete
: Prints whether the search is incomplete (still running)search_limited
: please useslk search
instead; submits and search and returns search ID if 1000 or less results are foundsearch_successful
: Prints whether the search was successfulsession
: prints until when the current slk session is validsize
: returns file size in byte (recursive size of namespace with-R
)submit_verify_job
: run a verify job for a provided set of filestape_barcode
: return tape barcode for given tape idtape_exists
: check if a tape existstape_id
: return tape id for given tape statustape_status
: returns the status of a tape for retrieval operations (write operations block tapes for reading)total_number_search_results
(hidden / expert command): print total number of search results of a search id; results which are not visible to the current user are counted as wellversion
: prints the version
Please have a look into slk helpers (on docs.dkrz.de) for a detailed description of the individual commands. Alternatively, the section slk Usage Examples contains several usage examples.
slk wrappers: bash SLURM job wrappers#
Certain tasks related to StrongLink require the user to wait some time because requests might be queued. Other tasks are reasonable to be repeated in certain time intervals. For these two purposes, we provide multiple SLURM job scripts. These scripts can be submitted manually by sbatch
or simple wrapper scripts can be used. All related scripts are part of all slk
modules with the full name matching slk/?.?.?_h?.?.?_w?.?.?
on Levante. The ?.?.?
are the version numbers of the included slk
, slk_helpers
and slk wrappers
versions. There wrapper scripts are currently available:
slk_wrapper_daily_login_check
: submit a daily check of the validity of the StrongLink login token; if the token is due to expire, an email is sent to the userslk_wrapper_recall_wait_retrieve
: similar to a simpleslk retrieve
but saves node hours; submit a recall to StrongLink followed by SLURM jobs which wait until all requested files are in the cache; then, a retrieval to the Lustre file system is startedslk_wrapper_version
: check the version of the used set of wrappersslk_wrapper_weekly_verify_job
: submit a weekly job which verifies the size of all cached files of the current user; compares actual file size with file size expected by StrongLink
pyslk: python slk wrapper#
We offer a python Wrapper package for the slk
and the slk_helpers
. Most commands of these two command line interfaces have corresponding wrappers in pyslk
. Usage examples of pyslk
are shown in the section pyslk in Getting Started with slk. Additionally, a pyslk API reference is provided. Feel free to download the package from pyslk availability.
Packing of data#
The tape archive delivers its best performance if the files to be archived are sufficiently large. The recommendations on packing remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file. However, the automatic metadata import from netCDF files does not work when they are packed. Hence, it has to be weighed for each use case whether a lower consumption of storage space or enriched file metadata are more valuable.
The package packems
, which was developed by MPI-M and DKRZ for the HPSS, has been adapted to StrongLink system. The process of packing & archiving of multiple data files to tape and their retrieval is simplified by this package. It consists of three command line programs:
a pack-&-archive tools
packems
,a list-archived-content tool
listems
,a retrieve-&-unpack tool
unpackems
and
Please use module load packems
to load the packems package. For details on the usage of packems
please have a look into the packems manual.
External access via sftp#
Currently, the HSM can only be accessed via Levante.
Metadata: harvesting, manual manipulation and search#
The StrongLink software reads and extracts extended file metadata from the headers of archived netCDF files and some other file types. Users may edit some of these metadata and add further metadata via slk tag
and slk_helpers json2hsm
. The existing metadata are described in detail on our metadata manual page (Reference: metadata schemata). Stored metadata can be printed via slk_helpers metadata
, slk tag -display
and slk_helpers hsm2json
. These three commands are meant for different use cases (see File Search and Metadata).
Files can be searched and found based on their metadata. The StrongLink software provides the command line tools slk search
, slk list
and slk retrieve
to search, list and retrieve files based on their metadata, respectively. Retrieval of files based on their absolute path in the HSM is also possible. Please see the sections Reference: metadata schemata, slk Usage Examples and StrongLink Command Line Interface (slk) for details on the usage of slk
in this context.
Backend data handling#
Just like with the previous HSM system HPSS, the fast disk cache is installed upstream of the tape system. Files selected for archival are first copied to the disk cache and then successively written onto tape. Files selected for retrieval are first copied from tape to the cache and then copied to the specified target locations. The retrieval of files that are still/already stored in the disk cache is considerably faster than the retrieval of files that are located on tape only.
The distribution of the files in the disk cache, primary tape archive and secondary tape archive is automatically controlled by the software StrongLink.
A -
or t
is appended to the permissions string of each file in the output of slk list
and slk_helpers search_list
. The -
indicates that the file is stored in the cache. The t
indicates the file is stored on tape. Alternatively, the command slk_helpers iscached
can be used to check whether a file is currently stored in the cache.
Further reading#
- Getting Started with slk
- Known Issues (read this!)
- slk issues on Levante
- slk archive/retrieve may use much memory and CPU time – careful with parallel slk calls on one node
- slk archive/retrieve is killed
- slk is hanging / unresponsive
- Non-recursive is semi-recursive!?
- slk writes no output in non-interactive mode
- slk never writes to stderr
- slk move cannot rename files
- slk archive compares file size and timestamp prior to overwriting files
- Availability of archived data and modified metadata is delayed by a few seconds
- A file listed by slk list is not necessarily available for retrieval yet
- failed/canceled slk archive/retrieve calls leave file fragments
- slk does not have a –version flag
- slk performance on different node types
- group memberships of user updated on login
- LDAP user not known to StrongLink prior to first login
- Filtering slk list results with “*”
- How to search non-recursively in a namespace
- Terminal cursor disappears if slk command with progress bar is canceled
- error “conflict with jdk/…” when the slk module is loaded
- slk needs at least Java version 13
- slk search yields RQL parse error
- slk login asks me to provide a hostname and/or a domain
- Archival fails and Java NullPointerException in the log
- slk ERROR: Unhandled error occurred, please check logs
- slk archive: Exception …: lateinit property websocket has not been initialized
- slk delete failed, but nevertheless file was deleted
- slk list will take very long when many search results are found
- slk search -user and -group do not work
- “Connection reset”, “Connection timeout has expired”, “Name or service not known”, “Unable to resolve hostname” and “Host not reachable” errors
- Archivals to tape
- Retrievals from tape
- File Search and Metadata
- slk usage examples
- Obtain Access Token
- Check if access token is still valid
- Archival
- Search files
- Generate search queries
- List files
- Retrieve files
- tag files (set metadata)
- print metadata (display tags)
- Change permissions and group of files and directories
- Get user/group IDs and names
- slk in batch jobs on compute nodes
- manual pages HSM
- slk: official StrongLink Command Line Interface
- slk helpers: slk extension provided by DKRZ
- slk wrappers: wrapper scripts to simplify StrongLink-tasks
- Reference: metadata schemata
- Reference: StrongLink query language
- JSON structure for/of metadata import/export
- Reference: StrongLink verify jobs
- Official StrongLink Command Line Interface Guide
- FAQ