logo
  • Documentation
  • Blog
  • News
  • DKRZ Website
  • Getting started at DKRZ
    • Getting a user account
      • DKRZ user account
      • CERA user account
      • ESGF user account
      • Shared user account
    • Resource allocation
    • User Support
      • Help Desk
      • Data Support - Data Management
      • Training
        • Program Analysis and Tools Workshop
        • Training on DKRZ’s HPC systems (mistral, HPSS and cloud)
        • Introduction to DKRZ’s HLRE-3 system Mistral
    • Terms of use
  • Mistral HPC System
    • Configuration
    • Access and Environment
    • File Systems
    • Developing and Using Scientific Software
      • Compiling and Linking
      • Debugging
      • Performance Analysis
        • Profiling with ARM MAP
        • Lightweight MPI analysis
        • Getrusage
        • Intel Tools
        • Score-P, Vampir and Extrae
      • Using Python
    • Running Jobs with Slurm
      • SLURM Introduction
      • Partitions and Limits
      • MPI Runtime Settings
      • Example Batch Scripts
      • Adapting job scripts for mistral phase2
      • Using srun
      • Advanced SLURM Features
      • Accounting and Priorities
      • Job monitoring
    • Data Processing on Mistral
    • Data Transfer
      • uftp
      • GridFTP
    • Containers on Mistral
    • Manuals
    • Known Issues
  • Levante HPC system
    • Configuration
    • Access and Environment
    • File Systems
    • Developing and Using Scientific Software
      • Compiling and Linking
      • Building your own package: spack
    • Running Jobs with Slurm
      • SLURM Introduction
      • Partitions and Limits
      • MPI Runtime Settings
      • Example Batch Scripts
    • Data Processing on Levante
  • Data Storage
    • HPC System Storage
    • HSM Stronglink (tape archive)
      • Getting Started with slk
        • Known Issues (read this!)
        • Retrievals from tape
        • Speedup Retrievals with Striping
        • Metadata and File Search
        • Switching: pftp to slk
      • slk usage examples
      • manual pages slk
        • slk: official StrongLink Command Line Interface
        • slk helpers: slk extension provided by DKRZ
        • Reference: metadata schemata
        • Reference: StrongLink query language
      • External Access
      • FAQ
    • Swift Object Storage
      • Swiftbrowser
      • Python-swiftclient
  • Software & Services
    • Jupyterhub
      • Overview
      • Quick Start
      • Spawner options
      • Kernels
      • Lab Extensions
      • Changelog
    • Gitlab
      • Gitlab-runner
    • Redmine
    • Data Science Services
      • Machine Learning on Mistral
    • ModVis
  • Data Services
    • Data Management Planning
    • Finding & Accessing Data
      • /pool/data user projects
      • ECMWF Reanalysis Products
      • Freva
        • Databrowser
        • ESGF databrowser
        • Crawling data
        • Plugins
        • Analysis history
        • Result browser
      • Do you need Data?
    • Processing & Analysing
    • Publishing & Dissemination
      • Data Preparation and Quality Checking
      • ESGF publication
      • Publication at WDCC
    • Archiving & Preserving
  • Visualization
    • Accessing Mistral’s GPU Nodes
      • VNC from Windows computers without Windows Subsystem for Linux (WSL)
    • ParaView
      • DKRZ ParaView tutorial document
      • The Paraview main screen
      • Paraview Examples
        • Create an image of sea surface speeds
        • Display clouds as a semi-transparent overlay
        • Create a multi-variable image / animation from a coupled ice sheet - climate simulation
        • Prepare a regional land and ocean map background
        • Streamline seeding in vector data
        • Volume rendering of (NARVAL II) ICON data
      • Camera and perspective
        • Set the background (color/gradient)
        • Camera: Follow Path
        • Camera: Orbit
        • Switch to camera parallel projection
        • Camera: Interpolate camera locations
        • Set the view size
      • Colormaps
        • Hack the color map:
        • Chose a different colormap
        • Invert the Colormap
        • Adjust the color bar and its legend
        • Rescale the colormap to a custom range
        • Save the colormap as a preset
        • Adjust the color map for a transparent display of clouds
      • Export
        • Composite animations using mogrify and ffmpeg
        • Export an animation
        • Saving animations and screenshots
      • Filters
        • Extrude a land surface based on topography
        • Apply a calculator to convert u and v into speed
        • Check if you have cell or point data
        • Use the contour filter to display isolines
        • Using an earth texture as background
        • Extrusion of topography and bathymetry
        • Isosurfaces and isocontours in regular lat-lon data
        • Add an image as texture (plane) in the background
        • Add an image as texture on a sphere
        • NetCDF: Date and time annotation
      • Light
        • Deactivate the light kit
      • Readers
        • Load NEMO 3D data with the netCDF CF reader
        • Activate the CDI reader plugin
        • The NetCDF CF reader
        • Load 2D ICON data with the CDI reader
        • Pre-Processing for a multi-variable image / animation
      • Technical
        • Adjust / check the camera controls
        • Open the Paraview settings
        • Make Paraview save a state on quitting or crashing
        • Parallel Visualization with ParaView
        • Automating visualizations in ParaView with PvBatch
        • Save a state file
    • VAPOR
    • NCL
      • NCL examples
    • Python
    • PyNGL
    • GrADS
On this page
  • Introduction
  • Storage options and quota
  • slk: command line tool for tape access
  • slk helpers: add-on to slk
  • pyslk: python slk wrapper
  • External access via sftp
  • Metadata: harvesting, manual manipulation and search
  • Packing of data
  • Backend data handling
  • Comparison: HPSS and StrongLink HSM
  • Further reading

HSM StrongLink (tape archive)¶

file version: 02 May 2022

Introduction¶

The DKRZ operates a hierarchal storage management system (HSM) used for the storage of all relevant data created and post processed on DKRZ systems. The hardware of the HSM consists of a disk cache and two tape libraries. The primary tape archive is located in the DKRZ building in Hamburg. Selected files are mirrored to the secondary tape archive located at the Max Planck Computing and Data Facility (MPCDF) in Garching. The software installed to operate the HSM is StrongLink. All command-line based user-interaction with the tape archive goes through StrongLink and its command line tool slk.

If you have questions, which are not answered on this page or on the page linked in Further reading, please have a look into our FAQ. If you do not find an answer there, please contact us via support@dkrz.de .

Storage options and quota¶

The amount of data that can be stored in the tape archive per project is limited by the available storage quota of that project. Individual users do not have a quota. Storage space on the HSM is applied for in conjunction with the (bi-)annual application for DKRZ compute and storage resources. You can check your storage quota via https://luv.dkrz.de . There is normal tape archive quota denoted as arch and quota for long term archival denoted as docu. Additionally, users might select very important files to be stored twice, i.e. one copy in Hamburg and one copy in Garching. The following table provides an overview:

Storage location, time and quota¶

File Storage

Storage Time

Used quota

How to achieve this

Past location (HPSS)

New location (StrongLink)

single copy on tape

1 year after expiration of DKRZ project

arch quota

default storage type

/hpss/arch/<prj>

/arch/<prj>

second copy on separate tape

1 year after expiration of project

arch quota; double file size used

store data in specific root namespace (see right column)

/hpss/double/<prj>

/double/<prj>

long-term storage for reference purpose

10 years after expiration of project

docu quota

Please contact data@dkrz.de

/hpss/doku/<prj>

/doku/<prj>

The tape archive delivers its best performance if the files to be archived are sufficiently large. The recommendations on packing developed on the basis of HPSS therefore remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file.

slk: command line tool for tape access¶

Note

slk stores a login token in the home directory of each user (~/.slk/config.json). The login token is valid for 30 days. By default, this file can only accessed by the respective user (permissions: -rw-------/600). However, users should be careful when doing things like chmod 755 * in their home directory. If you assume that your slk login token has been compromised please contact support@dkrz.de .

The StrongLink software comes with a command line tool suite slk. slk is the user interface to the StrongLink software and allows the user to interact with the HSM. The available commands are:

  • help: displays the slk help page

  • version: print the slk version

  • login: log in to the system with LDAP credentials

  • archive: copy files to the HSM

  • chmod: modify permissions of archived files (same as chmod on the Linux shell)

  • delete: delete a namespace (and all child objects for the namespace) or a specific file

  • group: change group ownership of archived files; for file owners and admins only

  • owner: change ownership of archived files; for admins only

  • tag: modify metadata of archived files

  • search: search archived files based on metadata; deactivated, please see slk_helpers search_limited instead

  • list: list searched files and some of their metadata (similar to ls on the Linux shell)

  • retrieve: retrieve files based on search result or based on absolute path; retrieves not more than 500 files; limited availability see Retrievals from tape

  • recall: recall files based on search result or based on absolute path (needed for External Access only)

  • move: move a file or a namespace from one namespace to another namespace (might be merged with slk rename in future)

  • rename: rename a file or a namespace (might be merged with slk move in future)

Note

StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.

Note

After logging on to the system, slk does not provide its own shell, but the user still navigates through the local file system, i.e. the parallel file systems of mistral and levante. slk therefore behaves more like a cp command on the Linux shell. It is also not possible to navigate through the emulated directory structure of the HSM using slk.

Note

Currently (May 2022), slk retrieve is only available on nodes of the slk, compute, shared and interactive partitions on Levante. All other slk commands are available on all nodes. If you wish to use slk retrieve interactively, please start an interactive batch session via the interactive partition with salloc (Data Processing on Levante).

Please read Known Issues before you start using the slk the first time. Please have a look into StrongLink Command Line Interface (slk) (on doc.dkrz.de) or into the StrongLink Command Line Interface Guide v3.1 for a detailed description of the individual commands. Alternatively, the sections Switching: pftp to slk and slk Usage Examples contain several usage examples.

slk helpers: add-on to slk¶

The slk is lacking a few minor but very useful features. The slk_helpers program adds these features. Its usage is very similar to the usage of slk. The following commands are available:

  • checksum: prints one or all checksums of a resource

  • exists: check if a resource exists; the resource id is returned

  • help: print help / usage information

  • hostname: prints the hostname to which slk is currently connect to or to which slk will connect to

  • iscached: checks if a file is in the HSM cache

  • mkdir: create a namespace in an already existing namespace (like mkdir on the Linux shell)

  • metadata: get metadata of a resource

  • resourcepath: get path for a resource id

  • search_limited: submits and search and returns search ID if 1000 or less results are found

  • session: prints until when the current slk session is valid

  • size: returns file size in byte

  • version: prints the version

Please have a look into slk helpers (on doc.dkrz.de) for a detailed description of the individual commands. Alternatively, the section slk Usage Examples contains several usage examples.

pyslk: python slk wrapper¶

We offer a python Wrapper package for the slk and the slk_helpers. Most commands of these two command line interfaces have corresponding wrappers in pyslk. Usage examples of pyslk are shown in the section pyslk in Getting Started with slk. Additionally, a pyslk API reference is provided. Feel free to download the package from pyslk availability.

External access via sftp¶

Currently, no alternativ to access via mistral is offered for the data access.

See the extra page on External Access.

Metadata: harvesting, manual manipulation and search¶

The StrongLink software reads and extracts extended file metadata from the headers of archived netCDF files. Users may edit some of these metadata and add further metadata via slk tag. The metadata which are saved are described in detail on our metadata manual page (Reference: metadata schemata). Additionally, extended metadata for some common file formats such as jpeg are extracted. Harvesting extended metadata from additional file formats used in the context of Earth System modeling is planned to be implemented.

Files can be searched and found based on their metadata. The StrongLink software provides the command line tools slk search, slk list and slk retrieve to search, list and retrieve files based on their metadata, respectively. Retrieval of files based on their absolute path in the HSM is also possible. Please see the sections Reference: metadata schemata, slk Usage Examples and StrongLink Command Line Interface (slk) for details on the usage of slk in this context.

Packing of data¶

The tape archive delivers its best performance if the files to be archived are sufficiently large. The recommendations on packing developed based on of HPSS therefore remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file.

The package packems, which was developed by MPI-M and DKRZ for the HPSS, has been partly adapted to the new HSM system. The process of packing & archiving of multiple data files to tape and their retrieval is simplified by this package. It consists of three command line programs:

  • a pack-&-archive tools packems,

  • a list-archived-content tool listems,

  • a retrieve-&-unpack tool unpackems and

Please use module load packems on Mistral to load the packems package. packems will be installed on Levante in May. Updates of packems will follow in the next months. For details on the usage of packems please have a look into the packems manual.

Backend data handling¶

Just like with the previous HSM system HPSS, the fast disk cache is installed upstream of the tape system. Files selected for archival are first copied to the disk cache and then successively written onto tape. Files selected for retrieval are first copied from tape to the cache and then copied to the specified target locations. The retrieval of files that are still/already stored in the disk cache is considerably faster then the retrieval of files that are located on tape only.

The distribution of the files in the disk cache, primary tape archive and secondary tape archive is automatically controlled by the software StrongLink. The users have no control regarding the storage location of their data.

A - or t is appended to the permissions string of each file in the output of slk list. The - indicates that the file is stored in the cache.

Comparison: HPSS and StrongLink HSM¶

The table below provides a comparison of a few key features of the HPSS and the new HSM (column “StrongLink”). Although “StrongLink” only is the name of the HSM software and is not related to the hardware, we denote the whole system as “StrongLink”.

performance information new HSM system¶

HPSS

StrongLink

Disk cache

2.5 PB (configured; 5 PB installed)

1.2 PB

Throughput rate

15 GB/s (average)

30 GB/s (average; 15 GB/s upload; 15 GB/s download)

Annual throughput

75 PB

120 PB

Server setup and reliability

One HPSS server + one backup server on standby

11 equal StrongLink Nodes (node = server); HSM becomes read-only when two nodes fail

Expand HSM performance

replace HPSS server

add additional nodes as needed

Extended metadata

no

yes

Search files by metadata

no

yes

It is important to note that the StrongLink system consists of n equal nodes. n is 11 at the moment but can be extended in future when more bandwidth for data archival or retrieval is needed. The system as a whole still works without any restrictions if one node fails. If a second node fails, the system will go into read-only mode to prevent inconsistencies in the metadata database. Hence, the system is considerably more failsafe then the current system and easier to extend for future data workflows.

Further reading¶

  • Getting Started with slk
    • Introduction
    • Overview
    • Load slk module
    • On which nodes to run slk
    • Login
    • Archive a file or directory
    • List content of a namespace
    • Ensure that slk archive terminated properly and that all files were archived completely
    • Retrieve a file or namespace
    • Move, rename and delete files or namespaces
    • Search files
    • Debugging
    • packems
    • pyslk
    • further content
  • slk usage examples
    • Obtain Access Token
    • Check if access token is still valid
    • Archival
    • Search files
    • List files
    • Retrieve files
    • tag files (set metadata)
    • Change permissions and group of files and directories
    • Get user/group IDs and names
    • slk in batch jobs on compute nodes
  • manual pages slk
    • slk: official StrongLink Command Line Interface
    • slk helpers: slk extension provided by DKRZ
    • Reference: metadata schemata
    • Reference: StrongLink query language
  • External Access
  • FAQ
    • General information about the HSM system
    • Data Migration
    • Training, Questions and Adaption of Workflows
    • Archiving and Retrieval
    • Additional features
    • Advanced Technical Aspects
    • Common issues
    • Changelog
Imprint and Privacy Policy

© Copyright 2021, Deutsches Klimarechenzentrum GmbH.

Created using Sphinx 4.3.1.