slk helpers: slk extension provided by DKRZ#

file version: 16 Jan 2023

current software versions: slk_helpers version 1.7.1

The slk_helpers is an extensions to the slk. The slk is developed by StrongLink and belongs to the StrongLink HSM software. The slk_helpers have been developed at the DKRZ to provide some useful functionality that is not included in the slk. If specific usage information is missing on this help page or if you encounter errors, please contact support@dkrz.de.

Note

StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.

slk_helpers help#

$ slk_helpers help

lists all commands

slk version#

$ slk version

print the current slk_helpers version

slk_helpers checksum#

$ slk_helpers checksum [-t CHECKSUM_TYPE] RESOURCE_PATH
  • -t, --type: checksum_type (possible values: sha512, adler32); omit to print all available checksums

Prints the checksum(s) of a resource. If -t is set, the checksum of type CHECKSUM_TYPE is retrieved. Possible values are sha512 and adler32. If -t is not set, all available checksums are printed. It only works for files and not for namespaces. Namespaces do not have checksums.

StrongLink calculates two checksums of each archived file and stores them in the metadata. It compares the stored checksums with the file’s actual checksums at certain stages of the archival and retrieval process. Commonly, users do not need to check the checksum manually. But, you can if you prefer to do it. If a file has no checksum then it has not been fully archived yet (e.g. the copying is still in progress; archival process canceled).

slk_helpers exists#

$ slk_helpers exists RESOURCE_PATH

Check if the resource RESOURCE_PATH exists. The resource id is returned if it exists. exists works for files and namespaces.

slk_helpers gen_file_query#

$ slk_helpers gen_file_query [-R] RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]
  • -R, --recursive: generate a query which does a recursive search

Generates a search query which can be used with slk search and slk_helpers search_limited to perform a search for the resources RESOURCE1, RESOURCE2, … . These can be either files or namespaces. If a filename without path is provided, then the file will be searched for everywhere in the HSM. Filenames may contain regular expressions but no bash wildcards/globs. The path to a file must not contain regular expressions. Detailed examples and explanations are given in Generate search queries for filenames.

slk_helpers gfbt#

please see group_files_by_tape

slk_helpers group_files_by_tape#

$ slk_helpers group_files_by_tape (RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]|--search-id SEARCH_ID) [-R] [--gen-search-query|--run-search-query] [--print-tape-id] [--print-tape-status]

RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]] (<list of GNS paths>) or --search-id SEARCH_ID are mandatory as input. A combination of both is not allowed.

  • --gen-search-query: generate and print (a) search query strings instead of the lists of files per tape, Default: false

  • -R, --recursive: Search namespaces recursively for input files, Default: false

  • --run-search-query: generate and run (a) search query strings instead of the lists of files per tape and print the search i, Default: false

  • --search-id SEARCH_ID: Use an existing search as input instead of a <list of GNS paths> (error is thrown if <list of GNS paths> is provided and --search_id is set)

  • --print-tape-id: print the tape id on the far left followed by a :, Default: false

  • --print-tape-status: print the status (avail or blocked) of the tape of each file group; if -i/--print-tape-id is set, this is printed: TAPE_ID, TAPE_STATUS: FILES; if -i/--tape is not set, this is printed: TAPE_STATUS: FILES, Default: false

Receives a list of files or a search id as input. Looks up which files are stored in the HSM cache and which are not stored in the HSM cache but only on tape. Files on tape are grouped by tape: each line of the output contains all files which are on one tape. The user can directly created a search query for retrieving all files from one tape (--gen-search-query) or directly run this search (--run-search-query). In the latter case, the search id is printed per tape. If the user wants to know the tape id and the tape status, she/he might use --print-tape-id and --print-tape-status, respectively.

slk_helpers hsm2json#

hsm2json [options] <GNS path>
  • --instant-metadata-record-output: not set: read the metadata records of all specified files and print them when the last record is read; if set: print a metadata record directly after it had been read. Needs -l/–write-json-lines to be set. Default: false

  • -o FILE, --outfile FILE: Write the output into a file instead to the stdout

  • -q, --quiet: print nothing to stdout (e.g. no summary), Default: false

  • -R, --recursive: export metadata from the HSM recursively (all files in sub-directories of the provided source path will be considered), Default: false

  • -r FILE, --restart-file FILE: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)

  • -s SCHEMA[,SCHEMA[...]], --schema SCHEMA[,SCHEMA[...]]: import only metadata fields of listed schemata (comma-separated list without spaces)

  • -v, --verbose: activate verbose mode, Default: false

  • -l, --write-json-lines: write JSON-lines instead of normal JSON, Default: false

  • -m MODE, --write-mode MODE: select write mode when -o/--outfile is set, Default: ERROR, Possible Values: [OVERWRITE, ERROR]

  • --print-summary: print summary on how many metadata records have been processed

  • --write-compact-json: do not print metadata as pretty but as compact JSON; default is pretty JSON

Extracts metadata from HSM file(s) and returns them in JSON structure. See JSON structure for/of metadata import/export for details.

slk_helpers hostname#

$ slk_helpers hostname

Prints the hostname to which slk is currently connected to or to which slk will connect. It should be archive.dkrz.de. This is the default value on each Levante node. You can overwrite the default hostname by exporting the environment variable SLK_HOSTNAME (e.g. by export SLK_HOSTNAME=stronglink.hsm.dkrz.de on bash).

slk_helpers iscached#

$ slk_helpers iscached RESOURCE_PATH

Checks if the resource RESOURCE_PATH is stored in the HSM cache. The user is informed via a text message whether RESOURCE_PATH exists. Additionally, the exit code will be 0 if the resource is in the cache and 1 if not (exit code: get the variable $? directly after the slk call). If a file is not stored in the cache then it is only stored on tape. Retrievals from tape will take considerable longer than retrievals from cache.

slk_helpers json2hsm#

json2hsm [options] <SL-JSON metadata file> <GNS path>
  • -l, --expect-json-lines: consider the input file to be JSON-lines instead of normal JSON, Default: false

  • --ignore-non-existing-metadata-fields: if set: if a metadata field of the source metadata record does not exist in StrongLink then this metadata field is skipped. if not set: throw an error and exit as soon a source metadata field does not exist in StrongLink. If this flag is not set but -k/--skip-bad-metadata-sets is set, then metadata records with non-existing metadata fields will be skipped. Default: false

  • --instant-metadata-record-update: not set: read the whole JSON file and collect all metadata updates => apply all updates in the end; if two metadata records exist for one resource, this will become apparent before any metadata are written; if set: write each metadata record to StrongLink directly after it has be read from the JSON file; if two metadata records exist for one resource, the first metadata record will be written to StrongLink and the duplication will remain undetected until the duplicate record is read from JSON. Default: false

  • -q, --quiet: print nothing to stdout (e.g. no summary), Default: false

  • -r FILE, --restart-file FILE: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)

  • -s SCHEMA[,SCHEMA[...]], --schema SCHEMA[,SCHEMA[...]]: import only metadata fields of listed schemata (comma-separated list without spaces)

  • -k, --skip-bad-metadata-sets: skip damaged / incomplete metadata sets [default: throw error], Default: false

  • -v, --verbose: activate verbose mode, Default: false

  • -m MODE, --write-mode MODE: select write mode for metadata, Default: OVERWRITE, Possible Values: OVERWRITE, KEEP, ERROR, CLEAN (CLEAN: first, delete all metadata from the target schema and, then, write new metadata)

Reads metadata from JSON will and write them to archived files into HSM. Uses relative paths from metadata records plus base path provided by the user to identify target files. See JSON structure for/of metadata import/export for details.

slk_helpers job_exists#

slk_helpers job_exists JOB_ID

Check if a tape read job with the given ID exists.

slk_helpers job_queue#

slk_helpers job_queue

Prints status of the queue of tape read jobs. The output looks like this:

$ slkh job_queue
total read jobs: 110
active read jobs: 12
queued read jobs: 98

slk_helpers job_status#

slk_helpers job_status JOB_ID

Check the status of a tape read job with the given ID. The status is one of these: ABORTED, QUEUED, PROCESSING and COMPLETED. When the status is QUEUED then the place in the queue is appended in brackets, e.g.: QUEUED (12).

slk_helpers metadata#

$ slk_helpers metadata RESOURCE_PATH
  • --alternative-output-format: different format to print metadata (each row is: schema.field: value), Default: false

Prints the available metadata of a resource. Corresponds to slk tag – whereas slk tag sets metadata and slk_helpers metadata prints metadata.

slk_helpers mkdir#

$ slk_helpers mkdir [-R] GNS_PATH
  • -R: use the -R create folders recursively, if the parent folders do not exist; Default: false

Creates a namespace in an already existing namespace (== create basename GNS_PATH in dirname GNS_PATH). This command works like mkdir on a Linux terminal. Create nested namespaces recursively when -R is set (like mkdir -r on Linux terminal).

slk_helpers resourcepath#

$ slk_helpers resourcepath RESOURCE_ID

Gets path for a resource id

slk_helpers search_limited#

$ slk search "RQL Search Query"
$ slk search 'RQL Search Query'

This command will conduct a background search for files that match the specific query specified using a query language syntax that was designed by StrongLink. The query language is described on the StrongLink query language page and in the StrongLink Command Line Interface Guide from page 6 onwards.

Note

Operators in queries start with a $. If a query is delimited by " then the $ has to be escaped by a leading \ (\$OPERATOR). Otherwise, the operator is interpreted as environment variable by the shell. Alternatively, use ' as delimiter.

A search result ID (search_id) will be returned if 1000 or less results were found. If more results were found, an error and no search ID will be printed. 1000 refers to the total number of results of which some might not be visible to the user. The search ID can be used to list and retrieve files from the archive (see below).

One might need the user or group ids of respective users/groups to search files belonging to them. These ids are obtained as follows.

Get user id:

# get your user id
$ id -u

# get the id of any user
$ id USER_NAME -u

# get the id of any user
$ getent passwd USER_NAME
#  OR
$ getent passwd USER_NAME | awk -F: '{ print $3 }'

# get user name from user id
$ getent passwd USER_ID | awk -F: '{ print $1 }'

Get group id:

# get group ID from group name
$ getent group GROUP_NAME
#  OR
$ getent group GROUP_NAME | awk -F: '{ print $3 }'

# get group name from group id
$ getent group GROUP_ID | awk -F: '{ print $1 }'

# get groups and their ids of all groups of which member you are
$ id

Please see our documentation of specific Usage Examples (slk Usage Examples) and the StrongLink Command Line Interface Guide for exemplary calls of slk search. These can be 1:1 used with slk_helpers search_limited.

Note

slk_helpers search_limited counts all files and namespaces that match the search query and that the current user is allowed to see/read. slk list lists only the respective files. Therefore, slk_helpers search_limited might print an error that more than 1000 resources were found although there are less than 1000 matches of which the user has read permissions. Moreover, different users might get different output of slk list for the same search id.

slk_helpers session#

$ slk_helpers session

Prints until when the current slk session is valid.

slk_helpers size#

$ slk_helpers size

Returns file size in byte

slk_helpers tape_exists#

$ slk_helpers tape_exists TAPE_ID

Returns whether the tape with tape id TAPE_ID exists in the tape library or not.

slk_helpers tape_status#

$ slk_helpers tape_status [--details] TAPE_ID
  • --details: print a more detailled description of the retrieval status (different states of avail are possible)

Prints the status of a tape for retrievals: avail or blocked.

Exit codes#

command

task

exit code

bad input command

always (redirected to slk help)

2

general

(not help, version and session)

session expired

2

issue related to config file

2

help

always

0

checksum

resource exists and has checksum

0

resource not found

1

requested checksum not available

1

any other error

2

exists

resource exists

0

resource does not exist

1

any error

2

export_metadata

any error

2

gen_file_query

query successfully generated

0

any error

2

gfbt (same as group_files_by_tape)

files successfully grouped

0

any error

2

group_files_by_tape

files successfully grouped

0

any error

2

hostname

hostname is set and is as printed

0

any error

2

hsm2json

metadata exported successfully

0

any error

2

iscached

resource exists and is cached

0

resource exists and is not cached

1

resource does not exist

2

any error

2

json2hsm

metadata imported successfully

0

any error

2

job_exists

job exists

0

job does not exist

1

any error

2

job_queue

number of jobs printed successfully

0

any error

2

job_status

status of the job printed successfully

0

any error

2

list_search

search id correct and search results to print

0

search id correct but no results to print

1

any error

2

metadata

resource exists and metadata available

0

resource does not exist

1

any error

2

mkdir

namespace successfully created

0

namespace with same name already exists

1

any other error

2

resourcepath

resource with given ID exists

0

resource with given ID does not exist

1

any error

2

search_limited

search successfully performed

0

any error

2

session

login token exists and is not expired

0

no login token

1

session expired

1

size

resource exists and has size

0

resource does not exist

1

any error

2

tape_exists

tape exists

0

tape does not exist

1

any error

2

tape_status

tape is available for reading

0

tape is blocked / currently no reading

1

any error

2

version

always

0

Technical background of selected commands#

slk_helpers gen_file_query#

The search query is generated as follows: The input file list is taken and each path is separated into filename (like basename PATH) and directory (like dirname PATH). All filenames in the same directory are grouped and a regular expression is generated which finds exactly these files. Then this expression is linked via an and to the respective directory in which these files are located. This is done for each distinct directory in the input. These search expressions are linked via an or at the top level.

It is checked whether a directory exists in StrongLink. An error is thrown if it doesn’t exist.

The resulting search query can be optimized in length by the user. We do not do this in the slk_helpers because it would add considerable complexity to the code.

Major Changes#

1.7.1#

  • fixed new tape status ERRORSTATE

1.7.0#

  • new commands: job_exists, job_status, job_queue

  • group_files_by_tape (and tape_status):
    • modified structure of the output

    • new tape status ERRORSTATE when the tape is in a bad state which needs intervention from the support

  • increased timeout for the time to establish a connection

  • minor restructuring of the code

1.6.0#

1.5.8#

  • fixed errors related to processing of JSON returned by StrongLink

  • restructured code

1.5.7#

changes from 1.4.0 to 1.5.7

  • json2hsm / import_metadata: * renamed command import_metadata to json2hsm * removed parameter --update-only-one-resource * --write-mode got new option CLEAN which cleans all metadata of the selected resource before setting the new metadata (clean == removes content). * print JSON formatted summaries when --print-json-summary is set

  • hsm2json / export_metadata: * renamed command export_metadata to hsm2json * hsm2json prints an export summary when new parameter --print-summary is set * hsm2json print JSON formatted summaries when --print-json-summary is set * do not print metadata as pretty but as compact JSON when --write-compact-json is set

  • slk_helpers list_search: print search results continuously (in contrast to collecting all search results, first, before printing them altogether as slk list does it)

  • updated tests

1.4.0#

  • removed import_metadata_recursive

  • merge the three other import_metadata_* commands to import_metadata: * --update-only-one-resource PATH_OD_ID => like import_metadata_one_file * --use-res-id => like import_metadata_use_res_id * none of the previous flags => like import_metadata_use_abs_path

  • JSON structure of metadata was incremented from v1.0.0 to v2.0.0; v2.0.0 is equal to the output of slk tag -display RESOURCE

  • remove -Q/--fully-quiet flag (fully quiet; suppress error messages)

  • readme updated

1.3.x#

  • new commands:
    • export_metadata

    • import_metadata_one_file

    • import_metadata_recursive

    • import_metadata_use_abs_path (hidden; meant for expert users)

    • import_metadata_use_res_id (hidden; meant for expert users)

  • new flags / arguments
    • slk_helpers metadata now has --alternative-output-format

  • slk_helpers gen_file_query a file list in a string which is separated by newlines

  • minor bug fixes

1.2.x#

  • new commands
    • gen_file_query: create a query string to search files, which are provided as input

    • list_search: list search results (incl. path of resources)

    • updated exit codes

1.1.x#

  • new commands
    • iscached: prints out whether a file is cached (== quick access) or not

    • search_limited: like slk search put works only for searches that 1000 results or less)

    • version: prints the version of slk_helpers