slk helpers: slk extension provided by DKRZ#

file version: 18 Dec 2023

current software versions: slk_helpers version 1.11.0

The slk_helpers is an extensions to the slk. The slk is developed by StrongLink and belongs to the StrongLink HSM software. The slk_helpers have been developed at the DKRZ to provide some useful functionality that is not included in the slk. If specific usage information is missing on this help page or if you encounter errors, please contact support@dkrz.de.

Note

StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.

slk_helpers#

$ slk_helpers (--pid|--help [COMMAND]|COMMAND ....)
  • --help: print help for COMMAND if specified and print general help otherwise

  • --pid: print the process id of the slk_helpers command

slk_helpers help#

$ slk_helpers help

lists all commands

slk version#

$ slk version

print the current slk_helpers version

slk_helpers checksum#

$ slk_helpers checksum [-t CHECKSUM_TYPE] (RESOURCE_PATH|--resource-id RESOURCE_ID)
  • --resource-id: get type of a file with given resource id instead of path; default: -1

  • -t, --type: checksum_type (possible values: sha512, adler32); omit to print all available checksums

Prints the checksum(s) of a resource. If -t is set, the checksum of type CHECKSUM_TYPE is retrieved. Possible values are sha512 and adler32. If -t is not set, all available checksums are printed. It only works for files and not for namespaces. Namespaces do not have checksums.

StrongLink calculates two checksums of each archived file and stores them in the metadata. It compares the stored checksums with the file’s actual checksums at certain stages of the archival and retrieval process. Commonly, users do not need to check the checksum manually. But, you can if you prefer to do it. If a file has no checksum then it has not been fully archived yet (e.g. the copying is still in progress; archival process canceled).

slk_helpers exists#

$ slk_helpers exists RESOURCE_PATH

Check if the resource RESOURCE_PATH exists. The resource id is returned if it exists. exists works for files and namespaces.

slk_helpers gen_file_query#

$ slk_helpers gen_file_query [-R] RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]
  • --cached-only: Search for files in the HSM cache; Default: false

  • -n / --no-newline: Do not print a newline in the end of the output; Default: false

  • --not-cached: Search only for files which must not be in the HSM cache; currently ignored / no function; Default: false

  • -R, --recursive: generate a query which does a recursive search

  • --tape-barcodes BARCODE1 [BARCODE2 [BARCODE3 [...]]]: Search only for files stored on tapes with the provided barcodes

Generates a search query which can be used with slk search to perform a search for the resources RESOURCE1, RESOURCE2, … . These can be either files or namespaces. If a filename without path is provided, then the file will be searched for everywhere in the HSM. Filenames may contain regular expressions but no bash wildcards/globs. The path to a file must not contain regular expressions.

The user can specify whether files only from selected tapes (--tape-barcodes ...), from the HSM cache (--cached-only) or not in the HSM cache (--not-cached) are to be retrieved.

Detailed examples and explanations are given in Generate search queries.

slk_helpers gen_search_query#

$ slk_helpers gen_search_query [-R] fieldname=value [fieldname=value [fieldname=value [...]]] --search-query '[existing search query]'
  • fieldname commonly consists of schema.field except when you search for a path or a smart_pool; see Reference: metadata schemata for all available metadata fields and their types

  • value is the value to search for; gen_search_query converts it to the correct type if needed

  • =: instead of = also <, >, <= and >= can be used; please set the whole condition 'fieldname<value' in quotation marks if another operator than = is used

  • --search-query [...]: insert an existing search query which is connected via an and operator with the newly generated search query

  • -R, --recursive: generate a query which does a recursive search when the metadata fieldname path is used; -R has no effect if path is not used

Generates a search query which can be used with slk search to perform a search for files which fulfill the provided conditions.

Detailed examples are given in Generate search queries.

slk_helpers gfbt#

please see group_files_by_tape

slk_helpers group_files_by_tape#

$ slk_helpers group_files_by_tape (RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]|--search-id SEARCH_ID|--search-query SERACH_QUERY) [-R] [-l|--list] [-c|--count-files] [--gen-search-query|--run-search-query] [--print-tape-barcode|--print-tape-id] [--print-tape-status] [--json|--json-pretty] [(--smtnps|--set-max-tape-number-per-search) N]

RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]] (<list of GNS paths>) or --search-id SEARCH_ID or --search-query SERACH_QUERY are mandatory as input. A combination of both is not allowed.

select type of input:

  • RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]: provide one or more paths to files or directories; directories only work with -R; filenames might contain regular expressions

  • -R, --recursive: Search namespaces recursively for input files

  • --search-id SEARCH_ID: Use an existing search as input

  • --search-query SEARCH_QUERY: Use a search query as input

select output format:

  • none: print a human-readable list with one tape per row

  • --count-tapes: only print the number of tapes; two lines are printed

  • --json: print the output as JSON (one line; see --json-pretty for pretty json)

  • --json-pretty: print the output as pretty JSON

  • --print-resource-id: print the resource id for each file instead of its path; is ignored when --gen-search-query, --run-search-query, --full or --count-files are set.

  • -v / --print-progress: verbosity level 1; print information on the progress of performed searches and similar

  • -vv : verbosity level 2; print more detailled information on the progress of performed searches and similar

select what should be done (basic):

  • -d / --details: print details per tape; implies --print-tape-barcode and --print-tape-status

  • -c / --count-files: counts the files per tape and prints this number instead of a file list

  • -f / --full: print details and run a search per tape; implies --print-tape-barcode, --print-tape-status and --run-search-query

select what should be done (advanced):

  • --gen-search-query: generate and print search query strings instead of the lists of files per tape

  • --run-search-query: generate and run search query strings; print the resulting search ID instead of the lists of files per tape

  • --print-tape-id: print the tape id on the far left followed by a :, Default: false

  • --print-tape-id: print the tape id on the far left followed by a :, Default: false

  • --print-tape-status: print the status (AVAILAVLE, BLOCKED or ERRORSTATE) of the tape of each file group. Additional special stati are UNAVAILABLE and UNCLEAR. The meaning of the statis is given in Tape Stati below.

  • --smtnps <N> / --set-max-tape-number-per-search <N>: set the maximum number of tapes N which are used per search; default: 1; max: 2

Receives a list of files or a search id as input. Looks up which files are stored in the HSM cache and which are not stored in the HSM cache but only on tape. Files on tape are grouped by tape: each line of the output contains all files which are on one tape. If the user wants to know the tape barcode and the tape status, she/he might use --print-tape-barcode and --print-tape-status, respectively. The flag --details implies both. The meaning of the statis is given in Tape Stati below. The user can directly create a search query for retrieving all files from one tape (--gen-search-query) or directly run this search (--run-search-query). The flag --full implies --run-search-query and --details. Additionally, the user can set --set-max-tape-number-per-search 2 to run one seach for two tapes each.

Note

Please contact support@dkrz.de if you encounter a tape with ERRORSTATE.

Structure of the output (if --count-tapes is not set):

[    cached ["(AVAILABLE  )"]: (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)]
[(      tape|TAPE_ID|TAPE_BARCODE) ["("TAPE_STATUS")"]":" (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)
...
[(      tape|TAPE_ID|TAPE_BARCODE) ["("TAPE_STATUS")"]":" (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)
[multi-tape ["(UNCLEAR    )"]: (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)]
[not stored ["(UNAVAILABLE)"]: (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)]

The row with cached in only printed if cached data are available. The status is always AVAILABLE. The row multi-tape in only printed if at least one file is stored on multiple tapes. The row not stored is only printed when files without storage information are present. Multiple rows with tape might be printed – one row per tape.

The output looks as follows when --count-tapes is set:

N tapes with single-tape files
M tapes with multi-tape files

Where N is the number of tapes with single-tape-only files (== number of normal tape) and M is the number of tapes onto which files in the multi-tape category are stored on.

slk_helpers hsm2json#

hsm2json [options] <GNS path>
  • --instant-metadata-record-output: not set: read the metadata records of all specified files and print them when the last record is read; if set: print a metadata record directly after it had been read. Needs -l/–write-json-lines to be set. Default: false

  • -o FILE, --outfile FILE: Write the output into a file instead to the stdout

  • -q, --quiet: print nothing to stdout (e.g. no summary), Default: false

  • -R, --recursive: export metadata from the HSM recursively (all files in sub-directories of the provided source path will be considered), Default: false

  • -r FILE, --restart-file FILE: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)

  • -s SCHEMA[,SCHEMA[...]], --schema SCHEMA[,SCHEMA[...]]: import only metadata fields of listed schemata (comma-separated list without spaces)

  • -v, --verbose: activate verbose mode, Default: false

  • -l, --write-json-lines: write JSON-lines instead of normal JSON, Default: false

  • -m MODE, --write-mode MODE: select write mode when -o/--outfile is set, Default: ERROR, Possible Values: [OVERWRITE, ERROR]

  • --print-summary: print summary on how many metadata records have been processed

  • --write-compact-json: do not print metadata as pretty but as compact JSON; default is pretty JSON

Extracts metadata from HSM file(s) and returns them in JSON structure. See JSON structure for/of metadata import/export for details.

slk_helpers hostname#

$ slk_helpers hostname

Prints the hostname to which slk is currently connected to or to which slk will connect. It should be archive.dkrz.de. This is the default value on each Levante node. You can overwrite the default hostname by exporting the environment variable SLK_HOSTNAME (e.g. by export SLK_HOSTNAME=stronglink.hsm.dkrz.de on bash).

slk_helpers iscached#

$ slk_helpers iscached [-v] [-vv] (RESOURCE_PATH [RESOURCE_PATH [...]]|--resource-id RESOURCE_ID|--search-id SEARCH_ID) [-R]
  • -R: search recursively in RESOURCE_PATH for files if RESOURCE_PATH is a namespaces/directory

  • --resource-id: check caching status of file with provided RESOURCE_ID instead of RESOURCE_PATH and SEARCH_ID; default: -1

  • --search-id SEARCH_ID: checking caching status of all files represented by provided SEARCH_ID instead of file with RESOURCE_PATH and a RESOURCE_ID; default: -1

  • -v: verbose mode; print list of non-cached files (== non-matching files) and a summary line

  • -vv: double verbose mode; print list of checked files incl. their status (is cached and is not cached) and a summary line

Note

Please provide either RESOURCE_PATH or --search-id SEARCH_ID or --resource-id RESOURCE_ID.

Checks if the resource RESOURCE_PATH is stored in the HSM cache. Accepts multiple RESOURCE_PATH``s. The user is informed via a text message whether ``RESOURCE_PATH exists. Additionally, the exit code will be 0 if the resource is in the cache and 1 if not (exit code: get the variable $? directly after the slk call). When --search-id SEARCH_ID is set, more than one file might be checked. If at least one file is not cached, 1 is return. 0 is only returned when all files are cached.

If a file is not stored in the cache then it is only stored on tape. Retrievals from tape will take considerable longer than retrievals from cache.

slk_helpers is_admin_session#

$ slk_helpers is_admin_session

Check if the user is currently logged in as admin to StrongLink. Not useful for normal users. Might be used to check whether a connection to StrongLink is possible.

slk_helpers is_on_tape#

$ slk_helpers is_on_tape [-v] [-vv] (RESOURCE_PATH [RESOURCE_PATH [...]]|--resource-id RESOURCE_ID|--search-id SEARCH_ID) [-R]
  • -R: search recursively in RESOURCE_PATH for files if RESOURCE_PATH is a namespaces/directory

  • --resource-id: check tape storage status of file with provided RESOURCE_ID instead of RESOURCE_PATH and SEARCH_ID; default: -1

  • --search-id SEARCH_ID: checking tape storage status of all files represented by provided SEARCH_ID instead of file with RESOURCE_PATH and a RESOURCE_ID; default: -1

  • -v: verbose mode; print list of files not on tape (== non-matching files) and a summary line

  • -vv: double verbose mode; print list of checked files incl. their status (is on tape and is not on tape) and a summary line

Note

Please provide either RESOURCE_PATH or --search-id SEARCH_ID or --resource-id RESOURCE_ID.

Checks if the resource RESOURCE_PATH is stored on tape. If a file is stored on tape and in the cache the this command will return true / on tape. Accepts multiple RESOURCE_PATH``s. The user is informed via a text message whether ``RESOURCE_PATH exists. Additionally, the exit code will be 0 if the resource is stored on a tape and 1 if not (exit code: get the variable $? directly after the slk call). When --search-id SEARCH_ID is set or RESOURCE_PATH / --resource-id RESOURCE_ID is a namespaces, more than one file might be checked. If at least one file is not on tape, 1 is return. 0 is only returned when all files are on tape.

If a file is not stored on tape then it is only stored in cache. Based on this command’s output it is not possible to determine whether a file is also stored in the HSM cache or not.

slk_helpers json2hsm#

json2hsm [options] <SL-JSON metadata file> <GNS path>
  • -l, --expect-json-lines: consider the input file to be JSON-lines instead of normal JSON, Default: false

  • --ignore-non-existing-metadata-fields: if set: if a metadata field of the source metadata record does not exist in StrongLink then this metadata field is skipped. if not set: throw an error and exit as soon a source metadata field does not exist in StrongLink. If this flag is not set but -k/--skip-bad-metadata-sets is set, then metadata records with non-existing metadata fields will be skipped. Default: false

  • --instant-metadata-record-update: not set: read the whole JSON file and collect all metadata updates => apply all updates in the end; if two metadata records exist for one resource, this will become apparent before any metadata are written; if set: write each metadata record to StrongLink directly after it has be read from the JSON file; if two metadata records exist for one resource, the first metadata record will be written to StrongLink and the duplication will remain undetected until the duplicate record is read from JSON. Default: false

  • -q, --quiet: print nothing to stdout (e.g. no summary), Default: false

  • -r FILE, --restart-file FILE: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)

  • -s SCHEMA[,SCHEMA[...]], --schema SCHEMA[,SCHEMA[...]]: import only metadata fields of listed schemata (comma-separated list without spaces)

  • -k, --skip-bad-metadata-sets: skip damaged / incomplete metadata sets [default: throw error], Default: false

  • -v, --verbose: activate verbose mode, Default: false

  • -m MODE, --write-mode MODE: select write mode for metadata, Default: OVERWRITE, Possible Values: OVERWRITE, KEEP, ERROR, CLEAN (CLEAN: first, delete all metadata from the target schema and, then, write new metadata)

Reads metadata from JSON will and write them to archived files into HSM. Uses relative paths from metadata records plus base path provided by the user to identify target files. See JSON structure for/of metadata import/export for details.

slk_helpers job_exists#

slk_helpers job_exists JOB_ID

Check if a tape read job with the given ID exists.

slk_helpers job_queue#

slk_helpers job_queue
  • -i <INTERPRET_TYPE> / --interpret <INTERPRET_TYPE>: interprete the length of the StrongLink recall job queue; possible values for INTERPRET_TYPE: * RAW / R: same as argument not set * TEXT / T: print short textual interpretation of the queue status => none, short, medium, long, jammed * DETAILS / D: print detailed textual interpretation of the queue status * NUMERIC / N: print a number representing the queue status => 0 (==none), 1 (==short), …, 4 (==jammed)

Prints length or the status of the queue of tape read jobs (recall jobs). The output looks like this:

$ slkh job_queue
total read jobs: 110
active read jobs: 12
queued read jobs: 98

$ slkh job_queue --interpret N
3

$ slkh job_queue --interpret T
long

or like this:

$ slkh job_queue
total read jobs: 4
active read jobs: 4
queued read jobs: 0

$ slkh job_queue --interpret N
0

$ slkh job_queue --interpret T
none

$ slk_helpers job_queue --interpret D
no queue, waiting time in the queue: none

slk_helpers job_report#

Warning

We recommend using the command result_verify_job instead of this (slk_helpers job_report). We strongly suggest to read Reference StrongLink verify reports prior to evaluating a verify report printed by this command.

$ slk_helpers job_report JOB_ID [(-o|--outfile) OUTFILE] [(-f|--force-overwrite)] [--return-incomplete-report]
  • -f / --force-overwrite: overwrite OUTFILE if -o / --outfile is set and OUTFILE already exists; error is thrown if file exists and -f is not set

  • -o / --outfile OUTFILE: write the job report into OUTFILE instead of printing it to stdout

  • --return-incomplete-report: try to print the job report even if the job has not finished yet; if job is not finished and --return-incomplete-report is not set, an error is thrown

Fetch the result of the verify job. We strongly suggest to read Reference: StrongLink verify reports prior to evaluating a verify report the first time. Some errors or warnings are confusing and can be ignored. We plan to provide a more user friendly version of slk_helpers job_report when we collected more experiance in using verify reports. Verify jobs are started via slk_helpers submit_verify_job.

slk_helpers job_status#

slk_helpers job_status JOB_ID

Check the status of a tape read job with the given ID. The status is one of these: ABORTED, QUEUED, PROCESSING, COMPLETED, SUCCESSFUL, FAILED and PAUSED. When the status is QUEUED then the place in the queue is appended in brackets, e.g.: QUEUED (12).

See Job Stati for descriptions of the job stati.

slk_helpers has_no_flag_partial#

$ slk_helpers has_no_flag_partial (RESOURCE_PATH [RESOURCE_PATH [...]]|--resource-id RESOURCE_ID|--search-id SEARCH_ID) [-R] [-v|-vv]
  • -R: search recursively in RESOURCE_PATH for files if RESOURCE_PATH is a namespaces/directory

  • --resource-id: check file with provided RESOURCE_ID instead of RESOURCE_PATH and SEARCH_ID; default: -1

  • --search-id SEARCH_ID: check all files represented by provided SEARCH_ID instead of file with RESOURCE_PATH and a RESOURCE_ID; default: -1

  • -v: single verbose mode; print list of files with flag (== non-matching files) and a summary line

  • -vv: double verbose mode; print list of checked files incl. their status (has no partial flag and has partial flag) and a summary line

Note

Please provide either RESOURCE_PATH or --search-id SEARCH_ID or --resource-id RESOURCE_ID.

Checks if a resource RESOURCE_PATH is flagged as “partial file” and prints the resource path if this is the case. --invert inverts the command checking mechanism so that all files which are not flagged as “partial file” are printed. Accepts multiple RESOURCE_PATH``s. Additionally, the exit code will be ``0 if at least one match was found and 1 if no match was found (exit code: get the variable $? directly after the slk call).

slk_helpers metadata#

$ slk_helpers metadata RESOURCE_PATH
  • --alternative-output-format: different format to print metadata (each row is: schema.field: value), Default: false

Prints the available metadata of a resource. Corresponds to slk tag – whereas slk tag sets metadata and slk_helpers metadata prints metadata.

slk_helpers mkdir#

$ slk_helpers mkdir [-R] GNS_PATH
  • -p / --parents: use the -p to create folders recursively, if the parent folders do not exist; throw no error if folder already exists (like ‘mkdir -p’)

    Default: false

  • -R: use the -R to create folders recursively, if the parent folders do not exist; throw an error if folder already exists

Creates a namespace in an already existing namespace (== create basename GNS_PATH in dirname GNS_PATH). This command works like mkdir on a Linux terminal. Create nested namespaces recursively when -R is set. slk_helpers mkdir -p behaves like mkdir -p on the Linux terminal.

slk_helpers print_rcrs#

$ slk_helpers print_rcrs (RESOURCE_PATH|--resource-id RESOURCE_ID)
  • --resource-id: get rcrs of a file with given resource id instead of path

Gets the r**esource **c**ontent **r**ecord**s for a resource path or resource id. Some files which where archived by HPSS were split into two parts which were stored on different tapes. If these files are accessed via StrongLink each file part gets its own checksum. There will be no overall checksum stored for the combined file. Therefore slk_helpers checksum prints no checksums for such files. If you need to verify such split files after retrieval, you can get the size and checksum of each file part via this command and, then, split the file via split -b <SIZE> <FILE>. The command does not necessarily print the file part information in the correct order. The information on the second file part might be printed first.

slk_helpers resourcepath#

Warning

Might be deprecated soon. Please use slk_helpers resource_path instead

$ slk_helpers resourcepath RESOURCE_ID

Gets path for a resource id

slk_helpers resource_path#

$ slk_helpers resource_path RESOURCE_ID

Gets path for a resource id

slk_helpers resource_permissions#

$ slk_helpers resource_permissions (RESOURCE_PATH|--resource-id RESOURCE_ID)
  • --resource-id: get type of a file with given resource id instead of path; default: -1

  • --as-octal-number: Do not return the permissions as combination of x , w, r and - but as three digit octal number.

Gets permissions for a resource path or resource id as combination of x , w, r and -

slk_helpers resource_type#

$ slk_helpers resource_type (RESOURCE_PATH|--resource-id RESOURCE_ID)
  • --resource-id: get type of a file with given resource id instead of path; default: -1

Gets the resource type (FILE or NAMESPACE) for a resource path or resource id

slk_helpers result_verify_job#

$ slk_helpers result_verify_job [--header|--sources|--number-errors|--number-sources] <job_id>
  • --header print header of the report instead of errors; Default: false

  • --number-errors print number of errors; Default: false

  • --number-sources print number of source resources; note: if one resource was trageted, it might be one file or namespace; if the number of targeted resources is larger than one, all of them were files

  • --sources print the sources (sources resources, source namespace)

Print verification errors collected by the a verify job with the id job_id. “Verification” means that the target size and the actual size of each targeted file are compared. Mismatches between these two sizes cause a verification error. The full report of a verification job, which can be extracet via slk_helpers job_report <job_id>, might contain additional warnings and errors, which are no relevant for the user.

slk_helpers search_incomplete#

Warning

This command is work in progress and might be changed in future.

$ slk_helpers search_incomplete <SEARCH_ID>

Prints out whether the search, to which the SEARCH_ID points, is incomplete (== still running) or not (== finished). A complete search might be successful or failed (see slk_helpers search_successful).

slk_helpers search_limited#

Warning

This command will be deprecated soon. Please use slk search instead.

$ slk search "RQL Search Query"
$ slk search 'RQL Search Query'

This command will conduct a background search for files that match the specific query specified using a query language syntax that was designed by StrongLink. The query language is described on the StrongLink query language page and in the StrongLink Command Line Interface Guide from page 6 onwards.

Note

Operators in queries start with a $. If a query is delimited by " then the $ has to be escaped by a leading \ (\$OPERATOR). Otherwise, the operator is interpreted as environment variable by the shell. Alternatively, use ' as delimiter.

A search result ID (search_id) will be returned if 1000 or less results were found. If more results were found, an error and no search ID will be printed. 1000 refers to the total number of results of which some might not be visible to the user. The search ID can be used to list and retrieve files from the archive (see below).

One might need the user or group ids of respective users/groups to search files belonging to them. These ids are obtained as follows.

Get user id:

# get your user id
$ id -u

# get the id of any user
$ id USER_NAME -u

# get the id of any user
$ getent passwd USER_NAME
#  OR
$ getent passwd USER_NAME | awk -F: '{ print $3 }'

# get user name from user id
$ getent passwd USER_ID | awk -F: '{ print $1 }'

Get group id:

# get group ID from group name
$ getent group GROUP_NAME
#  OR
$ getent group GROUP_NAME | awk -F: '{ print $3 }'

# get group name from group id
$ getent group GROUP_ID | awk -F: '{ print $1 }'

# get groups and their ids of all groups of which member you are
$ id

Please see our documentation of specific Usage Examples (slk Usage Examples) and the StrongLink Command Line Interface Guide for exemplary calls of slk search. These can be 1:1 used with slk_helpers search_limited.

Note

slk_helpers search_limited counts all files and namespaces that match the search query and that the current user is allowed to see/read. slk list lists only the respective files. Therefore, slk_helpers search_limited might print an error that more than 1000 resources were found although there are less than 1000 matches of which the user has read permissions. Moreover, different users might get different output of slk list for the same search id.

slk_helpers search_successful#

Warning

This command is work in progress and might be changed in future.

$ slk_helpers search_successful <SEARCH_ID>

Prints out whether the search, to which the SEARCH_ID points, was successful or not. A not-successful search might have failed or be incompleted (see slk_helpers search_incomplete).

slk_helpers searchid_exists#

$ slk_helpers searchid_exists <SEARCH_ID>

Prints out whether the provided search ID exists or not.

slk_helpers session#

$ slk_helpers session

Prints until when the current slk session is valid.

slk_helpers size#

$ slk_helpers size (RESOURCE_PATH|--resource-id RESOURCE_ID) [-R|--recursive] [--pad-spaces-left WIDTH] [-v|-vv]
  • --pad-spaces-left <width> pad spaces on the left of the printed size so that total width (spaces + number) is width; default: -1 (no padding)

  • -R / --recursive Calculate folder size by summing sizes of contained files recursively

  • --resource-id: get size of a file with given resource id instead of path; default: -1

  • -v single verbose mode: print sizes of all namespaces recursively

  • -vv double verbose mode: print sizes of all resources recursively

Returns file size in byte. If a namespace / directory is target and -R / --recursive is not set, 0 is returned. If a namespace / directory is target and -R / --recursive is set, the size is calculated recursively. If the resource does not exist, an error and exit code 1 are return. All other errors cause an exit code of 2.

slk_helpers submit_verify_job#

$ slk_helpers submit_verify_job [-v] RESOURCE_PATH [RESOURCE_PATH [...]] [(-R|--recursive)] [--save-mode]
$ slk_helpers submit_verify_job [-v] --resource-ids RESOURCE_ID [RESOURCE_ID [...]] [(-R|--recursive)] [--save-mode]
$ slk_helpers submit_verify_job [-v] --search-id SEARCH_ID [--save-mode] [--resume-on-page <n>]
$ slk_helpers submit_verify_job [-v] --search-query 'SEARCH_QUERY' [--save-mode]
# currently, only for admin users:
$ slk_helpers submit_verify_job [-v] (-i|--infile|--input) JSON_VERIFY_JOB_FILE
  • -R / --recursive: if a resource path or resource id points to a namespace, consider all resources in this namespace recursively

  • --resource-ids RESOURCE_ID [RESOURCE_ID [...]]: target resources by their resource ids instead of their resource paths

  • --resume-on-page <n>: resume the command and start submitting jobs starting with search result 1000 * n; internally, 1000 search results are on one ‘page’ and fetched by one request => therefore 1000 * n; you do not necessarily have read permissions for 1000 files per page

  • --save-mode: save mode suggested to be used in times of many timeouts; please do not regularly use this parameter; start one verify job per page of search results instead of one verify job for 50 pages of search results

  • --search-id SEARCH_ID: target resources which were found by this search

  • --search-query 'SEARCH_QUERY': target resources which will be found by a search defined by this search query

  • -v: verbose mode; print information on what is currently done recommended

  • -i / --infile / --input  JSON_VERIFY_JOB_FILE: a verify job can be described by a JSON expression; this JSON can be provided as file to this command

Starts a verify job for the selected files. Files, for which the current user does not have read permissions, are automatically ignored. No error message or warning is printed if files are ignored. The result of the verify job can currently be fetched as verify report via slk_helpers job_report. We strongly suggest to read Reference: StrongLink verify reports prior to evaluating a verify report the first time. The checked files are listed in the header of the verify report. We plan to provide a more user friendly version of slk_helpers job_report when we collected more experiance in using verify reports.

One verify job is limited to 50000 resources because the run time of the job considerably increases for higher number of resources. If the verification of more than 50000 files is requested, multiple verify jobs are submitted. All job ids are printed out – one job id per line. A verify job targeting 50000 approximately runs 6 minutes.

Verify jobs are submitted to the same StrongLink-internal queue to which also retrieval/recall jobs are submitted. Thus, if 100 retrieval/recall jobs wait in the queue then new verify jobs will line up in the end and need to wait a long time. No new verify jobs can be submitted by non-admin users if already two or more jobs run for their user name. If one submit_verify_job command call wants to submit multiple verify jobs, which number does exceed the limit of two jobs per user, the command is allowed to do so if at least one job slot is empty. Thus, more than two verify jobs might be running in certain situations.

Note

The option -i / --infile / --input is currently deactivated for normal users because via this parameter a few options could be set for a verify job which might be harmful for the speed or stability of the StrongLink system. When it will be possible in future to limit the usage of these options, we might release this parameter for general usage.

slk_helpers tape_barcode#

$ slk_helpers tape_barcode TAPE_ID

Returns the barcode of a tape with tape id TAPE_ID if it exists.

slk_helpers tape_exists#

$ slk_helpers tape_exists (TAPE_ID|--tape-barcode TAPE_BARCODE)

Returns whether the tape with tape id TAPE_ID or tape barcode TAPE_BARCODE exists in the tape library or not.

slk_helpers tape_id#

$ slk_helpers tape_id TAPE_BARCODE

Returns the ID of a tape with tape barcode TAPE_BARCODE if it exists.

slk_helpers tape_status#

$ slk_helpers tape_status [--details] (TAPE_ID|--tape-barcode TAPE_BARCODE)
  • --details: print a more detailled description of the retrieval status (different states of AVAILABLE are possible)

Prints the status of a tape with tape id TAPE_ID or tape barcode TAPE_BARCODE for retrievals: AVAILABLE, BLOCKED or ERRORSTATE. The meaning of the statis is given in Tape Stati below. Please contact support@dkrz.de if you encounter a tape with ERRORSTATE.

Tape Stati#

  • AVAILABLE: tape is fully available

  • BLOCKED: currently data is written onto the tape; recalls/retrievals targeting this tape will fail until the write process is finished; please wait a few hours

  • ERRORSTATE: tape is in an error state which needs to be reset; currently, no recall/retrieval from this tape is possible; please contact support@dkrz.de

  • UNAVAILABLE: only used in group_files_by_tape for files without storage information; no recall/retrieval possible

  • UNCLEAR: only used in group_files_by_tape for files stored on multiple tapes each; status of these tapes was not checked

Job Stati#

  • BLOCKED: job is blocked by another running job (please retry later; e.g. 60 min)

  • QUEUED: job is queued in StrongLink

  • PROCESSING: job is being processed (= files are read from tape)

  • PAUSED: job has been paused by a StrongLink admin; there is an issue with your job; please contact support@dkrz.de (data protection: StrongLink admins cannot view the job owner)

  • COMPLETED: job has been completed; was replaced by SUCCESSFUL and FAILED; might be returned in rare situations

  • SUCCESSFUL: job has been completed and was successful

  • FAILED: job has been completed and was not successful

  • ABORTED: job has been aborted by a StrongLink admin; there has been an issue with your job; please contact support@dkrz.de (data protection: StrongLink admins cannot view the job owner)

  • STOPPED: job has been stopped (very rare)

  • WAITING: job is waiting for something (very rare)

  • OTHER: other not clearly defined state (very rare)

Exit codes#

command

task

exit code

bad input command

always (redirected to slk help)

2

general

(not help, version and session)

(not help, version and session)

session expired

2

issue related to config file

2

conntection timeout or connection could not be established

3

help

always

0

checksum

resource exists and has checksum

0

resource not found

1

requested checksum not available

1

resource path and resource ID provided

2

any other error except connection issue

2

exists

resource exists

0

resource does not exist

1

any other error except connection issue

2

export_metadata

any error except connection issue

2

gen_file_query

query successfully generated

0

any other error except connection issue

2

gen_search_query

query successfully generated

0

a field name or a schema name does not exist or a value cannot be converted

1

any other error except connection issue

2

gfbt (same as group_files_by_tape)

files successfully grouped

0

any error except connection issue

2

group_files_by_tape

files successfully grouped

0

any error except connection issue

2

hostname

hostname is set and is as printed

0

any error except connection issue

2

hsm2json

metadata exported successfully

0

any error except connection issue

2

iscached

resource exists and is cached

0

resources exist and all of them are cached

0

resource exists and is not cached

1

resources exist and at least one is not cached

1

resource(s) do(es) not exist

2

resource path and resource ID provided

2

resource path and search ID provided

2

resource resource ID and resource ID provided

2

any other error except connection issue

2

is_admin_session

login token exists and belongs an admin user

0

login token exists but belongs a normal user

1

no login token

2

session expired

2

any error except connection issue

2

is_on_tape

resource exists and is on tape

0

resources exist and all of them are on tape

0

resource exists and is not on tape

1

resources exist and at least one is not on tape

1

resource(s) do(es) not exist

2

resource path and resource ID provided

2

resource path and search ID provided

2

resource resource ID and resource ID provided

2

any other error except connection issue

2

json2hsm

metadata imported successfully

0

any error except connection issue

2

job_exists

job exists

0

job does not exist

1

any error except connection issue

2

job_queue

number of jobs printed successfully

0

any error except connection issue

2

job_status

status of the job printed successfully

0

job has failed or was aborted

1

any error except connection issue

2

has_no_flag_partial

no file is flagged as “partial file”

0

at least one file is flagged as “partial file”

1

any error except connection issue

2

list_search

search id correct and search results to print

0

search id correct but no results to print

1

any error except connection issue

2

metadata

resource exists and metadata available

0

resource does not exist

1

any error except connection issue

2

mkdir

namespace successfully created

0

namespace with same name already exists

1

any other error except connection issue

2

print_rcrs

sizes and checksums of all file parts printed

0

file has 0 byte and no storage info

0

one or more checksums not available

1

file > 0 byte but has no storage info

2

resource exists but is a namespace (folder)

2

resource does not exist

2

invalid combination of input parameters

2

any other error except connection issue

2

resourcepath

resource with given ID exists

0

resource with given ID does not exist

1

any error except connection issue

2

resource_path

resource with given ID exists

0

resource with given ID does not exist

1

any error except connection issue

2

resource_permissions

resource with given ID exists

0

resource with given ID or path does not exist

1

resource path and resource ID provided

2

any error except connection issue

2

resource_type

resource with given ID exists

0

resource with given ID or path does not exist

1

resource path and resource ID provided

2

any error except connection issue

2

result_verify_job

job report successfully fetched and printed

0

job id does not exist, job not finished or job not-successfully finished

1

any error except connection issue

2

search_incomplete

search is incomplete (== search still running)

0

search is complete (== search has finished)

1

any error except connection issue

2

search_limited

search successfully performed

0

any error except connection issue

2

search_successful

search was successful

0

search failed or is still running (incomplete)

1

any error except connection issue

2

searchid_exists

search id exists

0

search id does not exist

1

any error except connection issue

2

session

login token exists and is not expired

0

no login token

1

session expired

1

any error except connection issue

2

size

resource exists (file or namespace)

0

resource with given ID or path does not exist

1

resource path and resource ID provided

2

any other error except connection issue

2

submit_verify_job

verify job successfully submitted

0

user reached allowed job limited of two jobs

1

wrong combination of input parameters

2

any other error except connection issue

2

tape_barcode

tape exists

0

tape does not exist

1

any error except connection issue

2

tape_exists

tape exists

0

tape does not exist

1

tape barcode and tape ID provided

2

any error except connection issue

2

tape_id

tape exists

0

tape does not exist

1

any error except connection issue

2

tape_status

tape is available for reading

0

tape is blocked / currently no reading

1

tape barcode and tape ID provided

2

any error except connection issue

2

version

always

0

Technical background of selected commands#

slk_helpers gen_file_query#

The search query is generated as follows: The input file list is taken and each path is separated into filename (like basename PATH) and directory (like dirname PATH). All filenames in the same directory are grouped and a regular expression is generated which finds exactly these files. Then this expression is linked via an and to the respective directory in which these files are located. This is done for each distinct directory in the input. These search expressions are linked via an or at the top level.

It is checked whether a directory exists in StrongLink. An error is thrown if it doesn’t exist.

The resulting search query can be optimized in length by the user. We do not do this in the slk_helpers because it would add considerable complexity to the code.

Major Changes#

1.11.0 (2023-12-08)#

  • If files are in an unclear caching state, commands like is iscached and is_on_tape will not exit with an error but throw a warning.

  • If files are in an unclear caching state, iscached will inform the user that files in unclear caching state exist and exit with an error. If -v or -vv is set, the files in unclear caching state will be listed.

Note

unclear caching state should only occur while a file is copied from tape to HSM-cache or when there is a connection issue between StrongLink and HSM-Cache. Please contact support@dkrz.de when this happens.

1.10.2 (2023-11-29)#

  • updated verbose messages for size and result_verify_job commands

  • updated help text of size

  • removed debugging output from submit_verify_job

1.10.1 (2023-11-20)#

  • updated verbose messages for size command

1.10.0 (2023-11-17)#

  • better handling of connection timeouts with StrongLink

  • updated submit_verify_job: * updated output when the command does not submit any verify job * added a parameter --resume-on-page <n> option to simplify resuming the command in times of many connection losses to StrongLink * added a parameter --save-mode to start verify jobs for only 1000 files or less in order to simplify restarting the command in times of many connection losses

  • new command result_verify_job: * list relevant errors of verify job (default; no special arguments) * list checked files (--soures) * get part of the header of the verify report (--header) * list number of errors and checked files (--number-errors and --number-sources)

  • extended command size by new parameters: * -R / --recursive for requesting the size of the content of folders recursively * --pad-spaces-left for space padding to the left in order to align file/namespaces sizes when the command is called multiple times

1.9.10 (2023-10-23)#

  • add --quiet / -q to command total_number_search_results (hidden)

1.9.9 (2023-10-17)#

  • access constraints for requesting job information; non-admin users may only access:
    • VERIFY jobs of the current user or

    • COPY jobs, which do retrievals/recalls and were started by slk

  • commands submit_verify_job_files and submit_verify_job_namespace from previous release only allowed for admin users

  • new commands:
    • submit_verify_job: run a verify job for a provided set of files

    • is_admin_session: Check if the use is currently logged in as normal user or admin user

    • search_incomplete: Prints whether the search is incomplete (still running)

    • search_successful: Prints whether the search was successful

    • search_immediately: Creates search and returns search id immediately, even if search is not finished (hidden; only for specific user cases)

1.9.8 (2023-10-05)#

  • new commands (hidden because not final versions):
    • job_report: print a job report; e.g. of a verify job

    • submit_verify_job_files: submit a verify-job for a list of files (as paths) or of resource ids

    • submit_verify_job_namespace: submit a verify-job for a namespace (as path) or a resource/namespace id

    • print_rcrs: (print size and checksums of file parts; some HPSS files are stored as two parts on two tapes

  • catch all HTTP status code >= 400 everytime an HTTP request is send

  • new job states: BLOCKED, PAUSED, STOPPED, WAITING, OTHER

1.9.7 (2023-08-16)#

  • mkdir has a new argument -p / --parent which is similar to -R but throws no error when target exists and is a namespace/folder; thus, it behaves like the Linux mkdir -p in the terminal

  • changed error message which mkdir prints when it receives a path to a file as target

1.9.6 (2023-08-02)#

  • gfbt / group_files_by_tape has new parameter --print-resource-id which will print resources IDs instead of file paths

1.9.5 (2023-07-14)#

  • bug fixes regarding some old HPSS files used in gfbt

1.9.4 (2023-07-13)#

  • bug fixes in command gen_search_query:
    • modified description of command

    • added new field tape_barcode

    • field smart_pool internally was compared against tape_barcode

    • value of field smart_pool is now checked against list of existing Smart Pools

1.9.3 (2023-07-05)#

  • fixed conversion of dates to seconds since 1970 instead of milliseconds since 1970; relevant for gen_search_query

  • gen_search_query now also understands the operators <, >, <= and >=

  • removed unnecessarily created instances of ObjectMapper

1.9.2 (2023-06-30)#

  • new command gen_search_query: * generates a JSON search query which can be run by slk search * accepts search conditions/fields like netcdf.Project='abc', resources.birth_time='2023-01-01T13:00:00' or path=/arch/bm0146/k204221 * search conditions/fields are linked via and * user can provide another existing search query via --search--query which is linked via and to the other input* new command gen_search_query

1.9.1 (2023-06-16)#

  • search_limited: did not recognize certain failed searches in the past

  • iscached: * no error thrown anymore when resource ids are inserted which represent namespaces * correct output is printed when only one file was provided and -v or -vv is set * fixed issue when checking caching status of 0 byte files

  • job_status: * new job stati FAILED and SUCCESSFUL replace old status COMPLETED; COMPLETED might still be used * returns exit code 1 if a job has status FAILED, ABORTED or ABORTING

  • new command is_on_tape: * same as iscached but checks if files are on tape * files, which are on tape and in the cache, are considered as being on tape * this is NOT the inverse of iscached => a file can be on tape and in the cache

  • optional verbose and summary output is now printed to stderr instead of stdout * hsm2json: verbose output printed to stderr (but summary not) * gfbt / group_files_by_tape: print verbose output (diagnostic purpose) to stderr * search_limited: search status

  • these commands can handle searches of which one or more resources were deleted * iscached * is_on_tape * has_no_flag_partial * hsm2json * gfbt / group_files_by_tape

  • list_search may list already deleted files (not checked for performance reasons)

1.9.0 (2023-05-16)#

  • changed exit codes to 3 when a timeout error is thrown or a connection cannot be established

  • command has_flag_partial has been renamed to has_no_flag_partial

  • has_no_flag_partial behaves the same like iscached

  • iscached * prints list of not-cached files when -v is set (now: negativ-list + summary; past: summary) * prints a summary in the end when -vv is set (now: full file list + summary; past: full file list)

  • job_queue * new argument --format which has the same meaning as -i / --interpret * new output format JSON / J

1.8.10 (2023-05-08)#

  • new command has_flag_partial to check whether a file is flagged as partial (incomplete) file

Warning

has_flag_partial was renamed to has_no_flag_partial in slk_helpers 1.9.0.

1.8.9 (2023-05-02)#

  • fixed an error in tape_status which was thrown when no barcode was provided

  • command job_queue` has new optional argument ``-i <INTERPRET_TYPE> / --interpret <INTERPRET_TYPE> with these values for INTERPRET_TYPE (case insensitive): * RAW/R: same as argument not set * TEXT/T: print short textual interpretation of the queue status => none, short, medium, long, jammed * DETAILS/D: print detailed textual interpretation of the queue status * NUMERIC/N: print a number representing the queue status => 0 (==none), 1 (==short), …, 4 (==jammed)

1.8.8 (2023-04-25)#

  • new command searchid_exists

  • iscached now accepts a directory/namespace as input (with -R set)

1.8.7 (2023-04-13)#

  • changed exit code of checksum when a file is stored on more than one tape from 1 to 2

  • editorial changes in the changelog

1.8.6 (2023-04-06)#

  • minor bugfixes in the error output messages

  • properly exit when wrong parameters are provided (in some situations)

1.8.5 (2023-04-06)#

  • new flag --help to print the help for a specific command; e.g. slk_helpers --help mkdir will print the help for mkdir

  • new hidden flag --pid will print the Linux process id of the Java virtual machine

  • group_files_by_tape has new flags: * --set-max-tape-number-per-search <N> / --smtnps <N> which causes the searches to be run not for one tape but for a maximum of N tapes – only if less than 50 files are to be retrieved per search * -v (same as --print-progress) and -vv for verbose and double-verbose output, respectively

  • no command help is printed when a command is used the wrong way

  • fixed: commands which expect a list of Strings did not recognize wrong parameters but interpreted them as items of the list

1.8.4 (2023-04-05)#

  • iscached and size also accept resource ids (via flag --resource-id) in addition to resource paths

  • iscached: * also accepts search ids (via flag --search-id) in addition to resource paths and resource ids * got the flags -v and -vv for verbose and double verbose mode, respectively

  • resource_type and resource_permissions expect a resource path by default xor a resource id via --resource-id

  • interal changes related to the new class Resource

  • group_files_by_tape * fixed when a file has no storage information * new parameter --search-query '<search_query>' * internal searches are performed differently which partly more efficient

  • gen_file_query: * new parameters --cached-only, --not-cached (currently not working) and --tape-barcodes TAPE1,TAPE2,... * bugs in the JSON output were fixed * properly deal with files without storage information * width of status in normal text output (value in brackets) increased by one

1.8.3 (2023-03-23)#

  • iscached properly prints the cache

  • changes in the code base: new classes Resource and Checksums

  • group_files_by_tape / gfbt has flags --json and --json-pretty

  • checksum did not work after update to 1.8.1

1.8.2 (2023-03-21)#

  • job_status`: fixed a certain job status which caused job_status to fail

1.8.1 (2023-03-20)#

  • a file might be split into multiple parts, which are stored on separate tapes; this was not captured properly by the following commands and is fixed now: * checksum: prints an error when a file is split because no checksum are available for the overall file (only for the file parts) * gfbt: properly identifies files stored on multiple tapes

1.8.0 (2023-03-14)#

  • new commands: * resource_permissions: print permissions of a resource * resource_type: print type of a resource (‘namespace’ or ‘file’) * resource_path: same as resourcepath * tape_barcode: get tape barcode from tape id (barcode needed for search queries) * tape_id: get tape id from barcode

  • new arguments * --tape-barcode is new for tape_exists and tape_status * --print-tape-barcode, -c/--count-files and --print-progress are new for group_files_by_tape / gfbt

  • search_limited has be deprecated; please use slk search

  • tests for tape_id, tape_barcode, tape_exists, resource_type, resource_permissions

1.7.6 (2023-03-07)#

  • minor bugfixes in the output of the command metadata

1.7.5 (2023-03-01)#

  • extended the command json2hsm by the argument -j/--json-string JSON_STRING which allows to pass a JSON string directly to the command instead of writing it into a file. If a filename is provided in addition, an error is thrown.

  • the commands hsm2json and metadata have a new argument --print-hidden; they do not print the field netcdf.Data by default (and other sidecar data); these data are printed when the new argument --print-hidden is set

1.7.4 (2023-02-10)#

  • removed , in the output of group_files_by_tape

  • improved conversion of dates in hsm2json and json2hsm

  • hsm2json exports dates according to ISO 8601

  • change JSON metadata standard from 2.1.0 to 2.1.1
    • added JSON metadata key mime_type

1.7.3 (2023-02-06)#

  • added one missing internally used job status (PAUSING)

  • change JSON metadata standard from 2.0.0 to 2.1.0
    • added JSON metadata key protocol

    • improved usage of JSON metadata key location

  • changed code structure

  • restructed code file for metadata

1.7.2 (2023-02-01)#

  • added one missing internally used job status (ABORTING)

1.7.1#

  • fixed new tape status ERRORSTATE

1.7.0#

  • new commands: job_exists, job_status, job_queue

  • group_files_by_tape (and tape_status):
    • modified structure of the output

    • new tape status ERRORSTATE when the tape is in a bad state which needs intervention from the support

  • increased timeout for the time to establish a connection

  • minor restructuring of the code

1.6.0#

1.5.8#

  • fixed errors related to processing of JSON returned by StrongLink

  • restructured code

1.5.7#

changes from 1.4.0 to 1.5.7

  • json2hsm / import_metadata: * renamed command import_metadata to json2hsm * removed parameter --update-only-one-resource * --write-mode got new option CLEAN which cleans all metadata of the selected resource before setting the new metadata (clean == removes content). * print JSON formatted summaries when --print-json-summary is set

  • hsm2json / export_metadata: * renamed command export_metadata to hsm2json * hsm2json prints an export summary when new parameter --print-summary is set * hsm2json print JSON formatted summaries when --print-json-summary is set * do not print metadata as pretty but as compact JSON when --write-compact-json is set

  • slk_helpers list_search: print search results continuously (in contrast to collecting all search results, first, before printing them altogether as slk list does it)

  • updated tests

1.4.0#

  • removed import_metadata_recursive

  • merge the three other import_metadata_* commands to import_metadata: * --update-only-one-resource PATH_OD_ID => like import_metadata_one_file * --use-res-id => like import_metadata_use_res_id * none of the previous flags => like import_metadata_use_abs_path

  • JSON structure of metadata was incremented from v1.0.0 to v2.0.0; v2.0.0 is equal to the output of slk tag -display RESOURCE

  • remove -Q/--fully-quiet flag (fully quiet; suppress error messages)

  • readme updated

1.3.x#

  • new commands:
    • export_metadata

    • import_metadata_one_file

    • import_metadata_recursive

    • import_metadata_use_abs_path (hidden; meant for expert users)

    • import_metadata_use_res_id (hidden; meant for expert users)

  • new flags / arguments
    • slk_helpers metadata now has --alternative-output-format

  • slk_helpers gen_file_query a file list in a string which is separated by newlines

  • minor bug fixes

1.2.x#

  • new commands
    • gen_file_query: create a query string to search files, which are provided as input

    • list_search: list search results (incl. path of resources)

    • updated exit codes

1.1.x#

  • new commands
    • iscached: prints out whether a file is cached (== quick access) or not

    • search_limited: like slk search put works only for searches that 1000 results or less)

    • version: prints the version of slk_helpers