DKRZ HSM (tape archive)#

file version: 16 June 2026

Warning

The Versity HSM system is not active yet. The StrongLink system will be replaced by Versity (Software) in the end of June 2026. We expect the Versity will be online on 6 July 2026.

Introduction#

The DKRZ operates a hierarchical storage management (HSM) system for medium- to long-term storage of large volumes of data. We offer self-service archival and retrieval for medium-term storage, as well as curated long-term storage via DOKU or WDCC. Storage quotas for self-service archival and DOKU are allocated on a project basis rather than per user. Data stored through the self-service archival service is retained for the duration of the project plus one additional year. Details on curated long term archival are available under Data Services -> Archiving & Preserving.

The software used to operate the HSM is ScoutAM by Versity. All self-service user interactions with the tape archive are facilitated by ATLAS which offers both a command line interface and a web UI. Additionally, read-only mount points are provided on all HPC nodes. Files archived via DOKU can also be accessed via ATLAS and via these mount points.

The hardware supporting the HSM consists of two geographically separated tape archives. The primary tape archive is located in the DKRZ building in Hamburg, where all archived files are stored. The secondary tape archive is located at the Max Planck Computing and Data Facility (MPCDF) in Garching. All files archived via curated long-term storage are duplicated to Garching. We offer two namespaces for self-service archival: one with duplication to Garching and twofold quota allocation, and one without duplication to Garching.

If you have questions that are not answered in this documentation, please contact us via support@dkrz.de.

Storage options, quota and file size#

The tape archive delivers its best performance if the files are sufficiently large but not too large. Please consider a file size of 300 GB as a soft upper limit. The Versity HSM system automatically combines small files in packs in order to improve write and read rate. Therefore, there is no general recommendation to pack small files as with the previous two systems – HPSS and StrongLink. Nevertheless, each file is accounted at least 100 MB. This number may be due to changes.

However, if you plan to archive a large number of small files like TB-size zarr datasets, please have a look into packing of data.

The amount of data that can be stored in the tape archive per project is limited by the available storage quota of that project. Individual users do not have a quota. There is normal tape archive quota for self-service archival and a separate quota for curated long-term archival (DOKU). Additionally, data can be long-term archived via the WDCC. The default root namespace for self-service is /arch/<project>. Files archived to this namespace are stored on one tape in Hamburg. Instead, users can archive very important files to /double/<project> which will then be stored twice, i.e. one copy in Hamburg and one copy in Garching. Long-term archival data is stored twice by default.

Storage space on the HSM is applied for in conjunction with the (bi-)annual application for DKRZ compute and storage resources. You can check your storage quota via https://luv.dkrz.de. The quota Archive Project denotes the quota for self-service archival and Archive Long Term denotes the quota for curated DOKU archival.

The following table provides an overview:

Storage Type

Root namespace

Storage period

Copy in Garching

Quota

Self-service

/arch/<project>

project period plus 1 year

no

Archive Project

Self-service

/double/<project>

project period plus 1 year

yes

Archive Project

Curated (DOKU)

/doku/<project>

project period plus 10 yrs

yes

Archive Long-Term

WDCC

> 10 years

(see WDCC)

(see WDCC)

Tools and Interfaces#

Different tools and interfaces are provided for the self-service archival and retrieval:

  • ATLAS

    • ATLAS WegGUI

    • ATLAS command line interface (acli)

  • read-only mounts /arch, /double and /doku on Levante login and interactive nodes (not on compute nodes)

  • Versity S3 Gateway (not available the first days after migration)

We recommend using ATLAS for all data transfers between the Lustre filesystem (/work and /scratch) and the tape archive. Please have a look into Getting Started and the acli and ATLAS help pages. Getting Started also contains hints on how to use the read-only mount points efficiently.

packing of data#

The HSM software by Versity is configured to pack small files automatically. This process is not visible to users and cannot be controlled by users. By default, files are packed in the order in which they are transfered into the HSM.

Commonly, users do not need to care about the automatic packing process. However, if several terabyte – or more – of small files are archived at once and if data-users are expected to access certain predefined subsets at once, please contact support@dkrz.de in advance. Depending on the use case we might set up specific packing policies.

Users who would like to pack their data manually can use packems: packems

Data access via Python#

In preparation.

External access#

In preparation.