Skip to main content
Ctrl+K
Logo image Logo image

Site Navigation

  • Documentation
  • Blog
  • News

Site Navigation

  • Documentation
  • Blog
  • News
Ctrl+K

Section Navigation

  • Getting started at DKRZ
    • Getting a user account
      • DKRZ user account
      • CERA user account
      • ESGF user account
      • Shared user account
    • Resource allocation
    • User Support
      • Help Desk
      • Data Support - Data Management
      • Training
        • Program Analysis and Tools Workshop
        • Training on DKRZ’s HPC systems (mistral, HPSS and cloud)
        • Introduction to DKRZ’s HLRE-3 system Mistral
    • Terms of use
  • Levante HPC system
    • Configuration
    • Access and Environment
    • File Systems
    • Developing and Using Scientific Software
      • Compiling and Linking
      • Building your own package: spack
    • Running Jobs with Slurm
      • SLURM Introduction
      • Partitions and Limits
      • Runtime Settings
      • Example Batch Scripts
      • Slurm Binding Website
    • Data Processing on Levante
    • Data Transfer
      • uftp
    • Remote file system
    • Known issues
  • Data Storage
    • HPC System Storage
    • HSM Stronglink (tape archive)
      • Getting Started with slk
      • Known Issues (read this!)
      • Archivals to tape
      • Retrievals from tape
      • File Search and Metadata
      • slk usage examples
      • manual pages slk
        • slk: official StrongLink Command Line Interface
        • slk helpers: slk extension provided by DKRZ
        • Reference: metadata schemata
        • Reference: StrongLink query language
        • JSON structure for/of metadata import/export
        • Official StrongLink Command Line Interface Guide
      • FAQ
    • Swift Object Storage
      • Swiftbrowser
      • Python-swiftclient
  • Software & Services
    • Jupyterhub
      • Overview
      • Quick Start
      • Spawner options
      • Kernels
      • Lab Extensions
      • VNC Desktop
      • Changelog
    • Gitlab
      • Gitlab-runner
    • Redmine
    • Data Science Services
      • Machine Learning on Levante
      • Setup on Levante using Conda
    • ModVis
  • Data Services
    • Data Management Planning
    • Finding & Accessing Data
      • /pool/data user projects
      • ECMWF Reanalysis Products
      • Freva
        • Databrowser
        • ESGF databrowser
        • Crawling data
        • Plugins
        • Analysis history
        • Result browser
      • Do you need Data?
    • Processing & Analysing
    • Publishing & Dissemination
      • Data Preparation and Quality Checking
      • ESGF publication
      • Publication at WDCC
    • Archiving & Preserving
  • Visualization
    • ParaView
      • ParaView on Levante
      • Paraview Examples
        • Create an image of sea surface speeds
        • Display clouds as a semi-transparent overlay
        • Create isosurfaces of ocean current speed
        • Create a multi-variable image / animation from a coupled ice sheet - climate simulation
        • Prepare a regional land and ocean map background
        • Streamline seeding in vector data
        • Volume rendering of (NARVAL II) ICON data
        • Isosurfaces and isocontours in regular lat-lon data
      • DKRZ ParaView tutorial document
      • The Paraview main screen
      • Camera and perspective
        • Set the background (color/gradient)
        • Camera: Follow Path
        • Camera: Orbit
        • Switch to camera parallel projection
        • Camera: Interpolate camera locations
        • Set the view size
      • Colormaps
        • Hack the color map:
        • Chose a different colormap
        • Invert the Colormap
        • Adjust the color bar and its legend
        • Rescale the colormap to a custom range
        • Save the colormap as a preset
        • Adjust the color map for a transparent display of clouds
      • Export
        • Composite animations using mogrify and ffmpeg
        • Export an animation
        • Saving animations and screenshots
      • Filters
        • Extrude a land surface based on topography
        • Apply a calculator to convert u and v into speed
        • Check if you have cell or point data
        • Use the contour filter to display isolines
        • Using an earth texture as background
        • Extrusion of topography and bathymetry
        • Compute isosurfaces from the point data
        • Add an image as texture (plane) in the background
        • Add an image as texture on a sphere
        • Use a threshold to get rid of missing missing values in ICON Ocean
        • NetCDF: Date and time annotation
      • Light
        • Deactivate the light kit
      • Readers
        • Load NEMO 3D data with the netCDF CF reader
        • The NetCDF CF reader
        • Load 2D ICON data with the CDI reader
        • Load 3D ICON Ocean data in Paraview
        • Pre-Processing for a multi-variable image / animation
      • Rendering Techniques
        • Use bump-mapping to composite current speed and SST
      • Technical
        • Adjust / check the camera controls
        • Open the Paraview settings
        • Activate the CDI reader plugin
        • Make Paraview save a state on quitting or crashing
        • Automating visualizations in ParaView with PvBatch
        • Save a state file
    • Python
    • NCL
      • NCL examples
    • PyNGL
    • GrADS

HSM StrongLink (tape archive)#

file version: 24 Jan 2023

Introduction#

The DKRZ operates a hierarchal storage management system (HSM) used for the storage of all relevant data created and post processed on DKRZ systems. The hardware of the HSM consists of a disk cache and two tape libraries. The primary tape archive has a storage capacity of more than 300 PB and is located in the DKRZ building in Hamburg. Selected files are mirrored to the secondary tape archive located at the Max Planck Computing and Data Facility (MPCDF) in Garching. The software installed to operate the HSM is StrongLink. All command-line based user-interaction with the tape archive goes through StrongLink and its command line tool slk.

If you have questions, which are not answered on this page or on the linked pages, please have a look into our FAQ. If you do not find an answer there, please contact us via support@dkrz.de .

Storage options and quota#

The amount of data that can be stored in the tape archive per project is limited by the available storage quota of that project. Individual users do not have a quota. Storage space on the HSM is applied for in conjunction with the (bi-)annual application for DKRZ compute and storage resources. You can check your storage quota via https://luv.dkrz.de . There is normal tape archive quota denoted as arch and quota for long term archival denoted as docu. Additionally, users might select very important files to be stored twice, i.e. one copy in Hamburg and one copy in Garching. The following table provides an overview:

Storage location, time and quota#

File Storage

Storage Time

Used quota

How to achieve this

Past location (HPSS)

New location (StrongLink)

single copy on tape

1 year after expiration of DKRZ project

arch quota

default storage type

/hpss/arch/<prj>

/arch/<prj>

second copy on separate tape

1 year after expiration of project

arch quota; double file size used

store data in specific root namespace (see right column)

/hpss/double/<prj>

/double/<prj>

long-term storage for reference purpose

10 years after expiration of project

docu quota

Please contact data@dkrz.de

/hpss/doku/<prj>

/doku/<prj>

The tape archive delivers its best performance if the files to be archived are sufficiently large. The recommendations on packing developed on the basis of HPSS therefore remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file. The optimal size of files written to tape is between 1 GB and a few 100 GB.

slk: command line tool for tape access#

Note

slk stores a login token in the home directory of each user (~/.slk/config.json). The login token is valid for 30 days. By default, this file can only accessed by the respective user (permissions: -rw-------/600). However, users should be careful when doing things like chmod 755 * in their home directory. If you assume that your slk login token has been compromised please contact support@dkrz.de .

The StrongLink software comes with a command line tool suite slk. slk is the user interface to the StrongLink software and allows the user to interact with the HSM. The available commands are:

  • help: displays the slk help page

  • version: print the slk version

  • login: log in to the system with LDAP credentials

  • archive: copy files to the HSM

  • chmod: modify permissions of archived files (same as chmod on the Linux shell)

  • delete: delete a namespace (and all child objects for the namespace) or a specific file

  • group: change group ownership of archived files; for file owners and admins only

  • owner: change ownership of archived files; for admins only

  • tag: modify metadata of archived files

  • search: search archived files based on metadata

  • list: list searched files and some of their metadata (similar to ls on the Linux shell)

  • retrieve: retrieve files based on search result or based on absolute path; retrieves not more than 500 files; limited availability see Retrievals from tape

  • recall: recall files based on search result or based on absolute path (recall: copy from tape to HSM cache)

  • move: move a file or a namespace from one namespace to another namespace (might be merged with slk rename in future)

  • rename: rename a file or a namespace (might be merged with slk move in future)

Note

StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.

Note

slk does not provide its own shell. slk login simply creates a login token which allows other slk commands to be used. It is not possible to navigate through the emulated directory structure of the HSM using a cd-like command. Instead, each slk commands needs the full path of resources as input.

Note

On Levante login nodes, slk retrieve can only retrieve one file at once. There are no limitations for retrievals on nodes of the compute, shared and interactive partitions. All other slk commands are available without any limitations on all nodes. If you wish to archive or retrieve large files or many files interactively, please start an interactive batch session via the interactive partition with salloc (Run slk in the “interactive” partition and Data Processing on Levante).

Please read Known Issues before you start using the slk the first time. Please have a look into StrongLink Command Line Interface (slk) (on docs.dkrz.de) or into the StrongLink Command Line Interface Guide for a detailed description of the individual commands. Alternatively, the page slk Usage Examples contains several usage examples.

slk helpers: add-on to slk#

The slk is lacking a few minor but very useful features. The slk_helpers program adds these features. Its usage is very similar to the usage of slk. The following commands are available:

  • checksum: prints one or all checksums of a resource

  • exists: check if a resource exists; the resource id is returned

  • help: print help / usage information

  • gen_file_query: generate a query to find files

  • group_files_by_tape/gfbt: groups a list of files based on which tape they are stored on

  • json2hsm: import metadata from a JSON file into the HSM

  • job_exists: check if a tape read job with the given ID exists

  • job_queue: prints status of the queue of tape read jobs

  • job_status: check the status of a tape read job with the given ID

  • hostname: prints the hostname to which slk is currently connect to or to which slk will connect to

  • hsm2json: export file metadata as JSON

  • iscached: checks if a file is in the HSM cache

  • list_search: Get paths of all resources found by search id (similar slk list SEARCH_ID but prints results continuously)

  • mkdir: create a namespace in an already existing namespace (like mkdir on the Linux shell)

  • metadata: get metadata of a resource

  • resourcepath: get path for a resource id

  • search_limited: submits and search and returns search ID if 1000 or less results are found

  • session: prints until when the current slk session is valid

  • size: returns file size in byte

  • tape_exists: check if a tape exists

  • tape_status: returns the status of a tape for retrieval operations (write operations block tapes for reading)

  • version: prints the version

Please have a look into slk helpers (on doc.dkrz.de) for a detailed description of the individual commands. Alternatively, the section slk Usage Examples contains several usage examples.

pyslk: python slk wrapper#

We offer a python Wrapper package for the slk and the slk_helpers. Most commands of these two command line interfaces have corresponding wrappers in pyslk. Usage examples of pyslk are shown in the section pyslk in Getting Started with slk. Additionally, a pyslk API reference is provided. Feel free to download the package from pyslk availability.

External access via sftp#

Currently, the HSM can only be accessed via Levante.

Metadata: harvesting, manual manipulation and search#

The StrongLink software reads and extracts extended file metadata from the headers of archived netCDF files and some other file types. Users may edit some of these metadata and add further metadata via slk tag and slk_helpers json2hsm. The existing metadata are described in detail on our metadata manual page (Reference: metadata schemata). Stored metadata can be printed via slk_helpers metadata, slk tag -display and slk_helpers hsm2json. These three commands are meant for different use cases (see File Search and Metadata).

Files can be searched and found based on their metadata. The StrongLink software provides the command line tools slk search, slk list and slk retrieve to search, list and retrieve files based on their metadata, respectively. Retrieval of files based on their absolute path in the HSM is also possible. Please see the sections Reference: metadata schemata, slk Usage Examples and StrongLink Command Line Interface (slk) for details on the usage of slk in this context.

Packing of data#

The tape archive delivers its best performance if the files to be archived are sufficiently large. The recommendations on packing remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file. However, the automatic metadata import from netCDF files does not work when they are packed. Hence, it has to be weighed for each use case whether a lower consumption of storage space or enriched file metadata are more valuable.

The package packems, which was developed by MPI-M and DKRZ for the HPSS, has been partly adapted to the new HSM system. The process of packing & archiving of multiple data files to tape and their retrieval is simplified by this package. It consists of three command line programs:

  • a pack-&-archive tools packems,

  • a list-archived-content tool listems,

  • a retrieve-&-unpack tool unpackems and

Please use module load packems to load the packems package. For details on the usage of packems please have a look into the packems manual.

Backend data handling#

Just like with the previous HSM system HPSS, the fast disk cache is installed upstream of the tape system. Files selected for archival are first copied to the disk cache and then successively written onto tape. Files selected for retrieval are first copied from tape to the cache and then copied to the specified target locations. The retrieval of files that are still/already stored in the disk cache is considerably faster than the retrieval of files that are located on tape only.

The distribution of the files in the disk cache, primary tape archive and secondary tape archive is automatically controlled by the software StrongLink.

A - or t is appended to the permissions string of each file in the output of slk list and slk_helpers search_list. The - indicates that the file is stored in the cache. The t indicates the file is stored on tape. Alternatively, the command slk_helpers iscached can be used to check whether a file is currently stored in the cache.

Further reading#

  • Getting Started with slk
    • Introduction
    • Start using slk
    • Archival
    • List content
    • Validate archivals
    • Retrieve a file or namespace
    • Move, rename and delete files or namespaces
    • Search files
    • Run slk in the “interactive” partition
    • Run slk as batch job
    • Debugging
    • packems
    • pyslk
  • Known Issues (read this!)
    • slk issues on Levante
    • slk archive/retrieve may use much memory and CPU time – careful with parallel slk calls on one node
    • slk archive/retrieve is killed
    • slk is hanging / unresponsive
    • Non-recursive is semi-recursive!?
    • slk writes no output in non-interactive mode
    • slk never writes to stderr
    • slk move cannot rename files
    • slk archive compares file size and timestamp prior to overwriting files
    • Availability of archived data and modified metadata is delayed by a few seconds
    • A file listed by slk list is not necessarily available for retrieval yet
    • failed/canceled slk archive/retrieve calls leave file fragments
    • slk does not have a –version flag
    • slk performance on different node types
    • group memberships of user updated on login
    • LDAP user not known to StrongLink prior to first login
    • “slk retrieve /source/ /target” and “slk retrieve /source /target” are not the same
    • Filtering slk list results with “*”
    • How to search non-recursively in a namespace
    • Terminal cursor disappears if slk command with progress bar is canceled
    • error “conflict with jdk/…” when the slk module is loaded
    • slk needs at least Java version 13
    • slk search yields RQL parse error
    • slk login asks me to provide a hostname and/or a domain
    • Archival fails and Java NullPointerException in the log
    • slk ERROR: Unhandled error occurred, please check logs
    • slk archive: Exception …: lateinit property websocket has not been initialized
    • slk delete failed, but nevertheless file was deleted
    • slk list will take very long when many search results are found
    • slk retrieve returns exit code 1 if one or more files are skipped
  • Archivals to tape
    • Introduction and Summary
    • Useful information on slk archive
    • What to do when slk archive was interrupted/killed?
    • How much data can I archive at once?
    • Validate archivals
    • Archival script templates
  • Retrievals from tape
    • Introduction and Summary
    • Recommendations for usage of slk retrieve
    • Resume interruped retrievals
    • Speed up your retrievals
    • Waiting and processing time of retrievals
    • Retrieval script templates
  • File Search and Metadata
    • Metadata in StrongLink
    • Set metadata
    • Print metadata
    • Search files by metadata
  • slk usage examples
    • Obtain Access Token
    • Check if access token is still valid
    • Archival
    • Search files
    • Generate search queries
    • List files
    • Retrieve files
    • tag files (set metadata)
    • print metadata (display tags)
    • Change permissions and group of files and directories
    • Get user/group IDs and names
    • slk in batch jobs on compute nodes
  • manual pages slk
    • slk: official StrongLink Command Line Interface
    • slk helpers: slk extension provided by DKRZ
    • Reference: metadata schemata
    • Reference: StrongLink query language
    • JSON structure for/of metadata import/export
    • Official StrongLink Command Line Interface Guide
  • FAQ
    • General information about the HSM system
    • Training, Questions and Adaption of Workflows
    • Archiving and Retrieval
    • Additional features
    • Advanced Technical Aspects
    • Common issues
    • Changelog
On this page
  • Introduction
  • Storage options and quota
  • slk: command line tool for tape access
  • slk helpers: add-on to slk
  • pyslk: python slk wrapper
  • External access via sftp
  • Metadata: harvesting, manual manipulation and search
  • Packing of data
  • Backend data handling
  • Further reading
Imprint and Privacy Policy

© Copyright 2021, Deutsches Klimarechenzentrum GmbH.

Created using Sphinx 5.3.0.