DKRZ HSM (tape archive)#
file version: 09 Jul 2024
Introduction#
The DKRZ operates a hierarchical storage management (HSM) system for medium-to-long term storage of large volumes of data. We offer self-service archival and retrieval for medium-term storage and curated long-term storage via DOKU or WDCC. Storage quota for self-service archival and for DOKU is provided on project-basis and not per user. The storage period for the self-service archival is the project lifetime plus one year. Details on curated long term archival are provided in Data Services -> Archiving & Preserving.
The software installed to operate the HSM is StrongLink. All self-service user interactions with the tape archive are facilitated by StrongLink’s command line tool slk
. Files archived via DOKU can also be retrieved via slk
.
The hardware of the HSM consists of two spatially separated tape archives. The primary tape archive is located in the DKRZ building in Hamburg and consists of a 2.5 PB
hard disk cache and multiple tape libraries providing more than 300 PB
storage capacity. All archived files are stored here. The secondary tape archive is located at the Max Planck Computing and Data Facility (MPCDF) in Garching. All files archived via curated long-term storage are duplicated to Garching. We offer two namespaces for self-service archival: one with duplication to Garching and twofold quota allocation; one without duplication to Garching.
If you have questions, which are not answered in this documentation, please contact us via support@dkrz.de.
Storage options, quota and file size#
The tape archive delivers its best performance if the files are sufficiently large. The transfer speed of files larger than approximately 250 GB
between Tape and HSM cache decreases due to cache limitations. Therefore, we recommend to archive files in the size range from 10 GB
to 200 GB
. We account at least 1 GB
quota per archived file and recommend to pack small files. You can use packems to pack small files into tar balls and create indices automatically. Archivals of files larger than 500 GB
have be tested successfully.
The amount of data that can be stored in the tape archive per project is limited by the available storage quota of that project. Individual users do not have a quota. There is normal tape archive quota for self-service archival and quota for curated long-term archival DOKU. Additionally, data can be long-term archived via the WDCC. The default root namespace for self-service is /arch/<project>
. Files archived to this namespace are stored on one tape in Hamburg. Instead, users can archive very important files to /double/<project>
which will then be stored twice, i.e. one copy in Hamburg and one copy in Garching. Long-term archival data is stored twice by default.
Storage space on the HSM is applied for in conjunction with the (bi-)annual application for DKRZ compute and storage resources. You can check your storage quota via https://luv.dkrz.de. The quota Archive Project denotes quota for self-service archival and Archive Long Term denotes quota for curated DOKU archivals.
The following table provides an overview:
Storage Type |
root namespace |
Storage period |
copy in Garching |
quota |
|
---|---|---|---|---|---|
self-service |
|
project period plus 1 year |
no |
Archive Project |
|
|
yes |
||||
curated |
DOKU |
|
project period plus 10 years |
yes |
Archive Long- Term |
WDCC |
|
|
command line tools for tape access#
slk and slk_helpers#
The command line tool slk
and its add-on slk_helpers
allows users to interact with the HSM. The slk
is the official StrongLink command line tool. It is lacking some useful features and its usage in scripts is limited. Therefore, the DKRZ developed slk_helpers
as an add-on. The slk_helpers
can be extended on user request. On Levante HPC, both tools are installed system-wide and accessible to all users via module load slk
.
Further reading:
basic usage examples: Getting Started
The functionality of the slk data retrieval command is limited on the Levante login nodes. It should be used on shared and interactive nodes.
Note
slk
stores a login token in the home directory of each user (~/.slk/config.json
). The login token is valid for 30 days. By default, this file can only accessed by the respective user (permissions: -rw-------
/ 600
). However, users should be careful when doing things like chmod 755 *
in their home directory. If you assume that your slk login token has been compromised please contact support@dkrz.de .
SLURM job wrapper scripts#
DKRZ provides SLURM job wrapper scripts that facilitate certain StrongLink tasks which require waiting time or should run in the background. These scripts combine slk
and slk_helpers
commands and submit one or multiple SLURM jobs. They are loaded together with slk
and slk_helpers
via the module slk
.
Further reading:
packems: packing of data#
The tape archive delivers its best performance if the archived files are in the size range of 10 GB to 250 GB. Therefore, small files should be packed into tar balls prior archiving. For this purpose, MPI-M and DKRZ developed packems
. packems
depends on slk
, the slk_helpers
and pyslk
(see below).
The Users can provide a list of files to packems
. packems
then automatically distributes them into tar balls with a targeted size of 100 Gb. It also creates an index file which contains information about which tar ball each individual file is located in. Please use module load packems
to load the packems
tool on HPC Levante.
StrongLink automatically imports metadata from netCDF files into DKRZ-internal StrongLink metadata database (see below). This StrongLink feature does not work with packed files. Hence, users need to weigh the advantages of packing against the automatic metadata import.
Further Reading: * basic usage example * extended usage examples * packems manual
pyslk: python slk wrapper#
We offer a Python library for interacting with the tape archive. Technically, the library consists of wrappers around slk
and the slk_helpers
. Therefore, it requires an environment where slk
and slk_helpers
are installed.
Further reading:
External access#
The tape archive can only be accessed via Levante.
Metadata: harvesting, manual manipulation and search#
The StrongLink software reads and extracts extended file metadata from the headers of archived netCDF files and some other file types. Users may edit some of these metadata and add further metadata via slk tag
and slk_helpers json2hsm
. The existing metadata are described in detail on our metadata manual page (Reference: metadata schemata). Stored metadata can be printed via slk_helpers metadata
, slk tag -display
and slk_helpers hsm2json
. These three commands are meant for different use cases (see File Search and Metadata).
Files can be searched and found based on their metadata. The StrongLink software provides the command line tools slk search
, slk list
and slk retrieve
to search, list and retrieve files based on their metadata, respectively. Retrieval of files based on their absolute path in the HSM is also possible. Please see the sections Reference: metadata schemata, slk Usage Examples and StrongLink Command Line Interface (slk) for details on the usage of slk
in this context.
Backend data handling#
Just like with the previous HSM system HPSS, the fast disk cache is installed upstream of the tape system. Files selected for archival are first copied to the disk cache and then successively written onto tape. Files selected for retrieval are first copied from tape to the cache and then copied to the specified target locations. The retrieval of files that are still/already stored in the disk cache is considerably faster than the retrieval of files that are located on tape only.
The distribution of the files in the disk cache, primary tape archive and secondary tape archive is automatically controlled by the software StrongLink.
A -
or t
is appended to the permissions string of each file in the output of slk list
and slk_helpers search_list
. The -
indicates that the file is stored in the cache. The t
indicates the file is stored on tape. Alternatively, the command slk_helpers iscached
can be used to check whether a file is currently stored in the cache.
Further reading#
- Getting Started with slk
- Known Issues (read this!)
- slk/slk_helpers terminate directly after start with “Name or service not known” or “Unhandled error occurred”
- slk list prints “Error getting namespace children for namespace:”
- slk archive/retrieve may use much memory and CPU time – careful with parallel slk calls on one node
- slk archive/retrieve is killed
- slk is hanging / unresponsive
- Non-recursive is semi-recursive!?
- slk writes no output in non-interactive mode
- slk never writes to stderr
- slk move cannot rename files
- slk archive compares file size and timestamp prior to overwriting files
- Availability of archived data and modified metadata is delayed by a few seconds
- A file listed by slk list is not necessarily available for retrieval yet
- failed/canceled slk archive calls leave incomplete files
- slk does not have a –version flag
- slk performance on different node types
- group memberships of user updated on login
- LDAP user not known to StrongLink prior to first login
- Filtering slk list results with “*”
- How to search non-recursively in a namespace
- Terminal cursor disappears if slk command with progress bar is canceled
- error “conflict with jdk/…” when the slk module is loaded
- slk needs at least Java version 13
- slk search yields RQL parse error
- slk login asks me to provide a hostname and/or a domain
- Archival fails and Java NullPointerException in the log
- slk ERROR: Unhandled error occurred, please check logs
- slk archive: Exception …: lateinit property websocket has not been initialized
- slk delete failed, but nevertheless file was deleted
- slk list will take very long when many search results are found
- slk search -user and -group do not work
- “Connection reset”, “Connection timeout has expired”, “Name or service not known”, “Unable to resolve hostname” and “Host not reachable” errors
- Archivals to tape
- Retrievals from tape
- File Search and Metadata
- slk usage examples
- Obtain Access Token
- Check if access token is still valid
- Archival
- Search files
- Generate search queries
- List files
- Print file and namespace size
- Retrieve files
- tag files (set metadata)
- print metadata (display tags)
- Change permissions and group of files and directories
- Get user/group IDs and names
- slk in batch jobs on compute nodes
- manual pages HSM
- slk: official StrongLink Command Line Interface
- slk helpers: slk extension provided by DKRZ
- slk wrappers: SLURM wrapper scripts to simplify StrongLink-tasks
- Reference: metadata schemata
- Reference: StrongLink query language
- JSON structure for/of metadata import/export
- Official StrongLink Command Line Interface Guide
- FAQ