Skip to main content
Ctrl+K
DKRZ Documentation  documentation - Home
  • Documentation
  • Blog
  • News
  • Home Page
  • Documentation
  • Blog
  • News
  • Home Page
Ctrl+K

Section Navigation

  • Getting started at DKRZ
    • Getting a user account
      • DKRZ user account
      • CERA user account
      • ESGF user account
      • Shared user account
    • Resource allocation
    • User Support
      • Help Desk
      • Data Support - Data Management
      • Training
        • Program Analysis and Tools Workshop
        • Training on DKRZ’s HPC systems (mistral, HPSS and cloud)
        • Introduction to DKRZ’s HLRE-3 system Mistral
    • Multi-Factor-Authentication
    • Terms of use
  • Levante HPC system
    • Configuration
    • Access and Environment
    • File Systems
    • Developing and Using Scientific Software
      • Compiling and Linking
      • Debugging
      • Building your own package: spack
      • Python
      • GPU Programming
    • Running Jobs with Slurm
      • Slurm Introduction
      • Partitions and Limits
      • Example Batch Scripts
      • Runtime Settings
      • Using GPU nodes
      • Accounting and Priorities
      • Slurm Binding Website
    • Data Processing on Levante
    • Data Transfer
      • uftp (Unicore FTP)
    • Remote file system
    • Known issues
    • Containers
      • Singularity
  • Data Storage
    • HPC System Storage
    • DKRZ HSM (tape archive)
      • Getting Started with slk
      • Known Issues (read this!)
      • Archivals to tape
      • Retrievals from tape
      • File Search and Metadata
      • slk usage examples
      • manual pages HSM
        • slk: official StrongLink Command Line Interface
        • slk helpers: slk extension provided by DKRZ
        • improved retrieval workflow v01
        • slk wrappers: SLURM wrapper scripts to simplify StrongLink-tasks
        • Reference: metadata schemata
        • Reference: StrongLink query language
        • JSON structure for/of metadata import/export
        • Reference: StrongLink verify jobs
        • Official StrongLink Command Line Interface Guide
      • FAQ
    • Swift Object Storage
      • Swiftbrowser
      • Python-swiftclient
    • S3 Object Storage
      • Storage Access
  • Software & Services
    • Jupyterhub
      • Overview
      • Quick Start
      • Spawner options
      • VNC Desktop
      • Kernels
      • Lab Extensions
      • Changelog
    • Gitlab
      • Gitlab-runner
    • Redmine
    • Data Science Services
      • Machine Learning on Levante
    • ModVis
    • ClusterCockpit
  • Data Services
    • Data Management Planning
    • Finding & Accessing Data
      • /pool/data user projects
      • ECMWF Reanalysis Products
      • Freva instances available at DKRZ
      • Freva
      • Do you need Data?
    • Processing & Analysing
    • Publishing & Dissemination
      • Data Preparation and Quality Checking
      • ESGF publication
      • Publication at WDCC
    • Archiving & Preserving
  • Visualization
    • ParaView
      • ParaView on Levante
      • Paraview Examples
        • Create an image of sea surface speeds
        • Display clouds as a semi-transparent overlay
        • Create isosurfaces of ocean current speed
        • Create a multi-variable image / animation from a coupled ice sheet - climate simulation
        • Prepare a regional land and ocean map background
        • Streamline seeding in vector data
        • Volume rendering of (NARVAL II) ICON data
        • Isosurfaces and isocontours in regular lat-lon data
      • DKRZ ParaView tutorial document
      • The Paraview main screen
      • Camera and perspective
        • Set the background (color/gradient)
        • Camera: Follow Path
        • Camera: Orbit
        • Switch to camera parallel projection
        • Camera: Interpolate camera locations
        • Set the view size
      • Colormaps
        • Hack the color map:
        • Chose a different colormap
        • Invert the Colormap
        • Adjust the color bar and its legend
        • Rescale the colormap to a custom range
        • Save the colormap as a preset
        • Adjust the color map for a transparent display of clouds
      • Export
        • Export an animation
        • Saving animations and screenshots
      • Filters
        • Combine files with AppendAttributes
        • Extrude a land surface based on topography
        • Apply a calculator to convert u and v into speed
        • Check if you have cell or point data
        • Use the contour filter to display isolines
        • Using an earth texture as background
        • Extrusion of topography and bathymetry
        • Compute isosurfaces from the point data
        • Add an image as texture (plane) in the background
        • Add an image as texture on a sphere
        • Use a threshold to get rid of missing missing values in ICON Ocean
        • NetCDF: Date and time annotation
      • Light
        • Deactivate the light kit
      • Readers
        • Load NEMO 3D data with the netCDF CF reader
        • The NetCDF CF reader
        • Load 2D ICON data with the CDI reader
        • Load 3D ICON Ocean data in Paraview
        • Pre-Processing for a multi-variable image / animation
      • Rendering Techniques
        • Use bump-mapping to composite current speed and SST
      • Technical
        • Adjust / check the camera controls
        • Open the Paraview settings
        • Activate the CDI reader plugin
        • Make Paraview save a state on quitting or crashing
        • Automating visualizations in ParaView with PvBatch
        • Save a state file
    • Python
    • NCL
      • NCL examples
    • PyNGL
    • GrADS
  • Documentation For Users
  • manual pages HSM
  • Reference:...

Reference: StrongLink verify jobs#

file version: 03 Feb 2025

current software version: slk_helpers version 1.13.2

Introduction#

StrongLink allows to run so called verify jobs which check the integrity of files based on their size. A verify job is started via

slk_helpers submit_verify_job ...

StrongLink stores the expected file size in the beginning of an archival process. A verify job compares the expected file size against the actual size of a file. Checksums or other parameters are not considered in this process. Only files which pass this check are written to tape. Therefore, only cached files can be verified.

The verify jobs run in the scheduling system as retrieval/recall jobs do. This means that, if there are many recall jobs waiting in the queue, new verify jobs are put in the end of the queue and need to wait. You can check this via

slk_helpers job_status <job_id>

Please do not submit more than ten recall jobs at once.

When a verify job is finished, you can fetch the results from StrongLink via

slk_helpers result_verify_job <job_id>

Run time#

A verify job targeting a few thousand of files runs approximately one minute. If you provide more than 50 000 files, multiple verify jobs are submitted with each up to 50 000 files as target. Run times which we experianced for larger file numbers are:

  • 10 000 files: 2.5 minutes

  • 50 000 files: 6 minutes

Shortcomings#

Verify jobs do not detect certain problematic cases. Therefore, the result_verify_job command was extended by a few tests which are automatically performed after the results of the verify job are collected. These checks might take some time and they can be deactivated with the parameter --quick.

The problematic cases which are not detected by verify jobs are:

  • 0-byte files

  • same file archived twice in parallel

  • various inconsistencies in the metadata of files

0-byte files skipped#

A verify job can be used to check whether a successful run of slk archive was actually successful or to check which files of a failed run of slk archive were not completely archived. When slk archive starts, it first creates 0-byte file entries in StrongLink for each file to be archived.

The default verify job of StrongLink skips many incompletely archived 0-byte files. They are easy to identify by a size-check of the archived files which a lot of users already do. However, users who only trust the verify job output, would overlook these files.

same file archived twice in parallel#

When a user starts the same call of slk archive only a few seconds apart, then the first archival creates the file entry in StrongLink but both files are archived in parallel to two different locations in the HSM cache. The metadata entry of the file is marked as completely archive as soon as one of the archivals ends. The version of the file, which archival ends last, is linked to the metadata entry. In most cases this causes no issue. However, when one of the archivals ends successful and the other archival fails/aborts afterwards, the defect/incomplete file is linked to the metadata entry while it is marked as successfully archived. Additionally, StrongLink will display the size of the complete file as file size and repeated archivals will skip the file.

various inconsistencies in the metadata of files#

Approximately 120 inconsistent metadata entries of archived files were detected since StrongLink became operational in November 2021 (approx. 20 Mio files archived in total). As soon as we identify such a case, we request a correction by the StrongLink support. Some of these cases cannot be reproduced. Some cases can be reproduced but not prevented by a normal StrongLink user. Most of this cases are no actual problem but might confuse the owner of the data. In a few cases, the retrieval of the file might be blocked or an old file version might be retrieved instead of the current version. Files were actually defect in less than 10 cases.

Submitting a verify job#

When you submit a verify job you receive a job id via which you can check the job status and fetch a verify report.

Submit a verify job for a namespace#

$ slk_helpers submit_verify_job /dkrz_test/netcdf/20240116c -R
Submitting up to 1 verify job(s) based on results of search id 704756:
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236962
Number of submitted verify jobs: 1

Please remember the verify job id 236962.

Submit a verify job for a list of files#

$ slk_helpers submit_verify_job /dkrz_test/netcdf/20240116c/test_netcdf_a.nc /dkrz_test/netcdf/20240116c/test_netcdf_b.nc
Submitting up to 1 verify job(s) based on results of search id 704757:
search results: pages 1 to 1 of 1; visible search results: 2; submitted verify job: 236963
Number of submitted verify jobs: 1

Please remember the verify job id 236963.

Both commands might take a while because a search is performed in the background which is slow in some situations. You can add the parameter -v for verbose output so that you see what the command is doing.

Submit verify job for a files matching a RegEx#

Submit a verify job for all *.nc files in a namespace using a search and search id. StrongLink interpretes Regular Expressions in the filenames. It does not interprete Regular Expressions in folder names. It does not interprete bash globs / wildcards. Therefore, the wildcard * equals the regular expression .*.

$ slk_helpers gen_file_query -R /dkrz_test/netcdf/20240116a/.*nc --cached-only
{"$and":[{"$and":[{"path":{"$gte":"/dkrz_test/netcdf/20240116a"}},{"resources.name":{"$regex":".*nc"}}]},{"smart_pool":"slpstor"}]}

$ slk search '{"$and":[{"$and":[{"path":{"$gte":"/dkrz_test/netcdf/20240116a"}},{"resources.name":{"$regex":".*nc"}}]},{"smart_pool":"slpstor"}]}'
Search continuing... ...
Search ID: 704758

$ slk_helpers submit_verify_job --search-id 704758
Submitting up to 1 verify job(s) based on results of search id 704758:
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236965
Number of submitted verify jobs: 1

OR simply

$ slk_helpers submit_verify_job  /dkrz_test/netcdf/20240116a/.*nc -R
Submitting up to 1 verify job(s) based on results of search id 704758:
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236965
Number of submitted verify jobs: 1

Both commands might take a while because a search is performed in the background which is slow in some situations. You can add the parameter -v for verbose output so that you see what the command is doing.

$ slk_helpers submit_verify_job  /dkrz_test/netcdf/20240116a/.*nc -R -v
Generating search query.
Search query is: '{"$and":[{"$and":[{"path":{"$gte":"/dkrz_test/netcdf/20240116a"}},{"resources.name":{"$regex":".*nc"}}]},{"smart_pool":"slpstor"}]}'.
Starting search query.
Search ID is: 704758
Search continuing. ........
Starting on page 1/1 and ending on page 1/1 of the search results of search 704758 (1000 search results per page).
Submitting up to 1 verify job(s) based on results of search id 704758:
Collecting search results from page 1 to page 1
    Collecting search results  1 to 1000
Collected 8 search results from page 1 to page 1
Generate verify query
Submit verify query
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236965
Number of submitted verify jobs: 1

Check the status of a verify job#

The results of a verify job are provided as a verify report. A verify report should not be fetch before the verify job is finished. slk_helpers job_status can be used to check the processing stated of a verify job. You can get the report if the state is COMPLETED or SUCCESSFUL. Otherwise, the report might not exist (e.g. if in state QUEUED) or might be incomplete (e.g. if in state PROCESSING).

$ slk_helpers job_status 156548
COMPLETED

$ slk_helpers job_status 157303
QUEUED (21)

Getting results of a verify job#

Please run this command to fetch the results of the verify job

$ slk_helpers result_verify_job 156548
Errors:
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_b.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_c.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_a.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_f.nc
Erroneous files: 4

Four size-mismatch errors were detected. The this case, these files should be re-archived or deleted from the archive.

Getting raw results of a verify job#

When the verify job is completed, the report can be fetched via slk_helpers verify_report. It can be written into a file …

$ slk_helpers job_report 156548 --outfile verify_report_job_156548.txt

… or printed to the terminal

$ slk_helpers job_report 156548
StrongLink_Version,UNKNOWN
job_id,156548
job_name,slk_helpers Verify Job by user k204221
policy_type,VERIFY
job_status,COMPLETED
source_namespace,/dkrz_test/netcdf/20230925a
destination_pools,[37]
quick_check,true
update_deleted_files,false
email_job_log,false
job_start,Wed Oct 04 21:13:09 UTC 2023
job_end,Wed Oct 04 21:14:33 UTC 2023
scan_start,Wed Oct 04 20:56:18 UTC 2023
scan_end,Wed Oct 04 21:13:57 UTC 2023
io_start,Wed Oct 04 20:56:18 UTC 2023
io_end,Wed Oct 04 21:14:33 UTC 2023
scanned_directories,2
scanned_file_size,21104159996
scanned_files,13
scanned_skipped_no_copies,2
io_bytes,18400035132
io_file_not_found,1
io_files,5
io_verify_size_failed,5

create date,status,message,description
Wed Oct 04 21:14:33 UTC 2023,INFO,IO completed,The IO for job 156548 has completed
Wed Oct 04 21:13:57 UTC 2023,INFO,scanner completed,The scanner for job 156548 has completed
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_b.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_a.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_c.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_f.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: File not found,File not found: /dkrz_test/netcdf/20230925a/file_001gb_g.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_d.nc
Wed Oct 04 21:13:10 UTC 2023,INFO,Skipped Resource without RCR,Resource ID: 74481521010 / path: /dkrz_test/netcdf/20230925a/file_001gb_h.nc
Wed Oct 04 21:13:10 UTC 2023,INFO,Skipped Resource without RCR,Resource ID: 74481678010 / path: /dkrz_test/netcdf/20230925a/empty_file.txt
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 20:56:18 UTC 2023,INFO,New job starting,Job 156548 is starting

Evaluating a verification report#

An example verification report is printed in the previous section.

The lines below create date,status,message,description are of interest. This is csv and you can load it into the spreadsheet software of your choice or as Python Pandas.DataFrame. Below we explain what certian status,message combinations mean and what should be done:

  • ERROR,File verification failed: Resource not stored in pool(s): [37]: please ignore this error; it indicates that the file has already been written to tape and deleted from the HSM cache

  • ERROR,File verification failed: Resource content size does not match record: please archive the file again

  • ERROR,File verification failed: File not found: directly contact support@dkrz.de

  • INFO,Skipped Resource without RCR: printed in different situations; please check which of the following cases applies
    • nothing to do if the target file has 0 byte size and this intended

    • please archive the file again if the target file has 0 byte size but should be larger

    • directly contact support@dkrz.de if a size greate 0 byte is printed

  • INFO,Skipped Soft-Deleted Resource: please ignore this information; the target file has been marked for deletion (deleted from user perspective) but has not been cleaned up yet

On this page
  • Introduction
  • Run time
  • Shortcomings
    • 0-byte files skipped
    • same file archived twice in parallel
    • various inconsistencies in the metadata of files
  • Submitting a verify job
    • Submit a verify job for a namespace
    • Submit a verify job for a list of files
    • Submit verify job for a files matching a RegEx
  • Check the status of a verify job
  • Getting results of a verify job
  • Getting raw results of a verify job
  • Evaluating a verification report
Imprint and Privacy Policy

© Copyright Deutsches Klimarechenzentrum GmbH.

Created using Sphinx 7.4.7.

Built with the PyData Sphinx Theme 0.15.4.