slk helpers: slk extension provided by DKRZ#

file version: 25 Jun 2025

current software versions: slk_helpers version 1.16.5

The slk_helpers is an extensions to the slk. The slk is developed by StrongLink and belongs to the StrongLink HSM software. The slk_helpers have been developed at the DKRZ to provide some useful functionality that is not included in the slk. The slk_helpers is loaded when slk is loaded with module load slk. If specific usage information is missing on this help page or if you encounter errors, please contact support@dkrz.de.

Note

StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.

slk_helpers#

$ slk_helpers (--pid|--help [COMMAND]|COMMAND ....)

--help: print help for COMMAND if specified and print general help otherwise
--pid: print the process id of the slk_helpers command

slk_helpers help#

print help

$ slk_helpers help

lists all commands

slk version#

print version

$ slk version

print the current slk_helpers version

slk_helpers checksum#

return checksums of resource; targets one resource only

$ slk_helpers checksum [-t CHECKSUM_TYPE] (RESOURCE_PATH|--resource-id RESOURCE_ID)

--resource-id: get type of a file with given resource id instead of path; default: -1
-t, --type: checksum_type (possible values: sha512, adler32); omit to print all available checksums

Prints the checksum(s) of a resource. If -t is set, the checksum of type CHECKSUM_TYPE is retrieved. Possible values are sha512 and adler32. If -t is not set, all available checksums are printed. It only works for files and not for namespaces. Namespaces do not have checksums.

StrongLink calculates two checksums of each archived file and stores them in the metadata. It compares the stored checksums with the file’s actual checksums at certain stages of the archival and retrieval process. Commonly, users do not need to check the checksum manually. But, you can if you prefer to do it. If a file has no checksum then it has not been fully archived yet (e.g. the copying is still in progress; archival process canceled).

slk_helpers exists#

Check if resource exists. Targets one resource only. Returns resource ID if resource path is provided. The commands slk_helpers resource_ids and slk_helpers resource_features allow multiple resource paths as arguments and return one resource id per line.

$ slk_helpers exists (<RESOURCE_PATH>|--resource-id <RESOURCE_ID>)

--resource-id: expects resource id to be provided instead of resource path

Check if the resource RESOURCE_PATH exists. In addition to checking whether the resource exists, the command returns the resource id. exists works for files and namespaces.

slk_helpers gen_file_query#

generates a search query JSON string for provided resource list

$ slk_helpers gen_file_query [-R] RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]

--cached-only: Search for files in the HSM cache; Default: false
-n, --no-newline: Do not print a newline in the end of the output; Default: false
--not-cached: Search only for files which must not be in the HSM cache; currently ignored / no function; Default: false
-R, --recursive: generate a query which does a recursive search
--tape-barcodes BARCODE1 [BARCODE2 [BARCODE3 [...]]]: Search only for files stored on tapes with the provided barcodes

Generates a search query which can be used with slk search to perform a search for the resources RESOURCE1, RESOURCE2, … . These can be either files or namespaces. If a filename without path is provided, then the file will be searched for everywhere in the HSM. Filenames may contain regular expressions but no bash wildcards/globs. The path to a file must not contain regular expressions.

The user can specify whether files only from selected tapes (--tape-barcodes ...), from the HSM cache (--cached-only) or not in the HSM cache (--not-cached) are to be retrieved.

Detailed examples and explanations are given in Generate search queries.

slk_helpers gen_search_query#

generates a search query JSON string searching provided fields

$ slk_helpers gen_search_query [-R] fieldname=value [fieldname=value [fieldname=value [...]]] --search-query '[existing search query]'

fieldname commonly consists of schema.field except when you search for a path or a smart_pool; see Reference: metadata schemata for all available metadata fields and their types
value is the value to search for; gen_search_query converts it to the correct type if needed
=: instead of = also <, >, <= and >= can be used; please set the whole condition 'fieldname<value' in quotation marks if another operator than = is used
--search-query [...]: insert an existing search query which is connected via an and operator with the newly generated search query
-R, --recursive: generate a query which does a recursive search when the metadata fieldname path is used; -R has no effect if path is not used

Generates a search query which can be used with slk search to perform a search for files which fulfill the provided conditions.

Detailed examples are given in Generate search queries.

slk_helpers gfbt#

please see group_files_by_tape

slk_helpers group_files_by_tape#

check on which tapes provided resources are stored on and return grouping of resources per tape

$ slk_helpers group_files_by_tape \
    (<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]|--resource-ids <RESOURCE_ID_1> [<RESOURCE_ID_2> [...]]|--search-id <SEARCH_ID>|--search-query <SEARCH_QUERY>) [-R] \
    [-l|--list] [-c|--count-files] [--gen-search-query|--run-search-query] [--print-tape-barcode|--print-tape-id] [--print-tape-status] [--json|--json-pretty] [(--smtnps|--set-max-tape-number-per-search) <N>] \
    [-d <DESTINATION_RETRIEVAL> [-ns]|-wf1 <DESTINATION_RETRIEVAL>]
    [--write-resource-id [--append-output|--overwrite-output]]

<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]] (<list of GNS paths>) or --search-id <SEARCH_ID> or --search-query <SEARCH_QUERY> are mandatory as input. A combination of both is not allowed.

select type of input:

<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]: provide one or more paths to files or directories; directories only work with -R; filenames might contain regular expressions
-R, --recursive: Search namespaces recursively for input files
--regex, --evaluate-regex-in-input:
--resource-ids <RESOURCE_ID_1> [<RESOURCE_ID_2> [...]]: Use an existing search as input
--search-id <SEARCH_ID>: Use an existing search as input
--search-query <SEARCH_QUERY>: Use a search query as input

select output format:

none: print a human-readable list with one tape per row
--count-tapes: only print the number of tapes; two lines are printed
--json: print the output as JSON (one line; see --json-pretty for pretty json)
--json-pretty: print the output as pretty JSON
--print-resource-id: print the resource id for each file instead of its path; is ignored when --gen-search-query, --run-search-query, --full or --count-files are set.
-v, --print-progress: verbosity level 1; print information on the progress of performed searches and similar
-vv : verbosity level 2; print more detailled information on the progress of performed searches and similar
-wf1 <DESTINATION_RETRIEVAL>, --retrieval-workflow-1 <...>: shortcut for -dst <...>, --wrid, -ns, --details and --count-files => print resource number per tape, write resources ids into file with one file per tape, create config file for recall and retrieve watchers

select what should be done (basic):

-d, --details: print details per tape; implies --print-tape-barcode and --print-tape-status
-c, --count-files: counts the files per tape and prints this number instead of a file list
-f, --full: print details and run a search per tape; implies --print-tape-barcode, --print-tape-status and --run-search-query

select what should be done (advanced):

-ao, --append-output: when -wrid is set and target files already exist, append output to them (error otherwise)
-dst <DESTINATION_RETRIEVAL>, --destinationPath <DESTINATION_RETRIEVAL>: ignore files which exist already in DESTINATION_RETRIEVAL
--gen-search-query: generate and print search query strings instead of the lists of files per tape
-ns: preserve original namespace in destinationPath
-oo, --overwrite-output: when -wrid is set and target files already exist, append output to them (error otherwise)
--print-tape-id: print the tape id on the far left followed by a :, Default: false
--print-tape-id: print the tape id on the far left followed by a :, Default: false
--print-tape-status: print the status (AVAILAVLE, BLOCKED or ERRORSTATE) of the tape of each file group. Additional special stati are UNAVAILABLE and UNCLEAR. The meaning of the statis is given in Tape Stati below.
--run-search-query: generate and run search query strings; print the resulting search ID instead of the lists of files per tape
--smtnps <N>, --set-max-tape-number-per-search <N>: set the maximum number of tapes N which are used per search; default: 1; max: 2
-wrid, --write-resource-id: write resource ids per tape to text files with names files_tape_<tape barcode>.txt; further created files are files_all.txt (all resource ids), tapes.txt (all tapes for which the first type of file are created) and config.sh (parameters for watcher scripts); possibly, the files files_multipleTape.txt, files_notStored, files_ignored and/or files_cached are created

Receives a list of files or a search id as input. Looks up which files are stored in the HSM cache and which are not stored in the HSM cache but only on tape. Files on tape are grouped by tape: each line of the output contains all files which are on one tape. If the user wants to know the tape barcode and the tape status, she/he might use --print-tape-barcode and --print-tape-status, respectively. The flag --details implies both. The meaning of the statis is given in Tape Stati below. The user can directly create a search query for retrieving all files from one tape (--gen-search-query) or directly run this search (--run-search-query). The flag --full implies --run-search-query and --details. Additionally, the user can set --set-max-tape-number-per-search 2 to run one seach for two tapes each.

When you want to use the new slk_helpers retrieve and slk_helpers recall commands, you might choose to run gfbt with -wf1 <DESTINATION_RETRIEVAL>. You will receive multiple files – one file per tape – containing the ids of the resources stored on the respective tape. These files can be simply piped into the new recall or retrieve commands as follows:

$ slk_helpers gfbt -wf1 /scratch/k/k204221/blub
...
$ ls *.txt
files_all.txt
files_cached.txt
files_tape_C00652L6.txt
files_tape_C42350L6.txt
$ cat files_tape_C42350L6.txt | slk_helpers recall --resource-ids
1234567
$ cat files_cached.txt | slk_helpers retrieve --resource-ids -d /scratch/k/k204221/blub
...

Note

Please contact support@dkrz.de if you encounter a tape with ERRORSTATE.

Structure of the output (if --count-tapes is not set):

[    cached ["(AVAILABLE  )"]: (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)]
[(      tape|TAPE_ID|TAPE_BARCODE) ["("TAPE_STATUS")"]":" (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)
...
[(      tape|TAPE_ID|TAPE_BARCODE) ["("TAPE_STATUS")"]":" (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)
[multi-tape ["(UNCLEAR    )"]: (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)]
[not stored ["(UNAVAILABLE)"]: (FILES_LIST|FILE_COUNT|SEARCH_QUERY|SEARCH_ID)]

The row with cached in only printed if cached data are available. The status is always AVAILABLE. The row multi-tape in only printed if at least one file is stored on multiple tapes. The row not stored is only printed when files without storage information are present. Multiple rows with tape might be printed – one row per tape.

The output looks as follows when --count-tapes is set:

N tapes with single-tape files
M tapes with multi-tape files

Where N is the number of tapes with single-tape-only files (== number of normal tape) and M is the number of tapes onto which files in the multi-tape category are stored on.

slk_helpers hostname#

return hostname to which slk_helpers connect

$ slk_helpers hostname

Prints the hostname to which slk is currently connected to or to which slk will connect. It should be archive.dkrz.de. This is the default value on each Levante node. You can overwrite the default hostname by exporting the environment variable SLK_HOSTNAME (e.g. by export SLK_HOSTNAME=stronglink.hsm.dkrz.de on bash).

slk_helpers hsm2json#

export resource metadata and return them in JSON format

hsm2json [options] <GNS path>

--instant-metadata-record-output: not set: read the metadata records of all specified files and print them when the last record is read; if set: print a metadata record directly after it had been read. Needs -l/–write-json-lines to be set. Default: false
-o FILE, --outfile FILE: Write the output into a file instead to the stdout
-q, --quiet: print nothing to stdout (e.g. no summary), Default: false
-R, --recursive: export metadata from the HSM recursively (all files in sub-directories of the provided source path will be considered), Default: false
-r FILE, --restart-file FILE: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)
-s SCHEMA[,SCHEMA[...]], --schema SCHEMA[,SCHEMA[...]]: import only metadata fields of listed schemata (comma-separated list without spaces)
-v, --verbose: activate verbose mode, Default: false
-l, --write-json-lines: write JSON-lines instead of normal JSON, Default: false
-m MODE, --write-mode MODE: select write mode when -o/--outfile is set, Default: ERROR, Possible Values: [OVERWRITE, ERROR]
--print-summary: print summary on how many metadata records have been processed
--write-compact-json: do not print metadata as pretty but as compact JSON; default is pretty JSON

Extracts metadata from HSM file(s) and returns them in JSON structure. See JSON structure for/of metadata import/export for details.

slk_helpers init_watchers#

Generates files required by the recall and retrieve watcher scripts (see slk_helpers start_watchers). Simplified version of slk_helpers gfbt.

$ slk_helpers init_watchers \
    (<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]|--resource-ids <RESOURCE_ID_1> [<RESOURCE_ID_2> [...]]|--search-id <SEARCH_ID>) \
    [-R] [--regex]
    -d <DESTINATION_RETRIEVAL> [-ns]
    [--append-output|--overwrite-output]

select type of input:

<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]: provide one or more paths to files or directories; directories only work with -R; filenames might contain regular expressions
-R, --recursive: Search namespaces recursively for input files
--regex, --evaluate-regex-in-input:
--resource-ids <RESOURCE_ID_1> [<RESOURCE_ID_2> [...]]: consider input as resource IDs instead of resource paths
--search-id <SEARCH_ID>: consider input as search ID (multiple search IDs not allowed)

select destination path for retrieval:

-d <DESTINATION_RETRIEVAL>, -dst <...>, --destionation <...>: destination to where files are retrieved to; files which already exist in detination are ignored
-ns: preserve original namespace in destination path (appended to destination path)

what should happen when file lists exists which init wachters wants to write:

default: throw error when at least one of the new files already exists
-ao, --append-output: when target files already exist, append output to them
-oo, --overwrite-output: when target files already exist, append output to them

Receives a list of resoure paths, resource ids or a search id as input. Looks up which files are stored in the HSM cache and which are not stored in the HSM cache but only on tape. Files on tape are grouped by tape: each line of the output contains all files which are on one tape. Finally, multiple files are created in the current directory, which are required by the slk watchers to run.

$ slk_helpers init_watchers /arch/.../file01.nc /arch/.../file02.nc ... -d /scratch/k/k204221/blub -ns
...
$ ls *.txt
files_all.txt
files_cached.txt
files_tape_C00652L6.txt
files_tape_C42350L6.txt
$ cat files_tape_C42350L6.txt | slk_helpers recall --resource-ids
1234567
$ cat files_cached.txt | slk_helpers retrieve --resource-ids -d /scratch/k/k204221/blub
...

Note

Please contact support@dkrz.de if you encounter a tape with ERRORSTATE.

slk_helpers initialize_watchers#

please see init_watchers

slk_helpers iscached#

check if resources are cached

$ slk_helpers iscached [-v] [-vv] (RESOURCE_PATH [RESOURCE_PATH [...]]|--resource-id[s] RESOURCE_ID [RESOURCE_ID [...]]|--search-id SEARCH_ID) [-R]

-R: search recursively in RESOURCE_PATH for files if RESOURCE_PATH is a namespaces/directory
--resource-id, --resource-ids: check caching status of file(s) with provided RESOURCE_ID``(s) instead of ``RESOURCE_PATH and SEARCH_ID; multiple resource ids are allowed
--search-id SEARCH_ID: checking caching status of all files represented by provided SEARCH_ID instead of file with RESOURCE_PATH and a RESOURCE_ID; only one search id is allowed
-v: verbose mode; print list of non-cached files (== non-matching files) and a summary line
-vv: double verbose mode; print list of checked files incl. their status (is cached and is not cached) and a summary line

Checks if the resources are stored in the HSM cache. Accepts multiple RESOURCE_PATH``s, multiple ``RESOURCE_ID``s or one ``SEARCH_ID. These three input types are mutually exclusive. A SEARCH_ID might point to more than one file. If RESOURCE_PATH or RESOURCE_ID points to a namespace instead of a file, please set -R and files contained in the namespace are checked recursively.

The user is informed via a text message whether provided resource(s) exist(s). Additionally, the exit code will be 0 if all checked resources are cached and 1 if at least one resource is not cached (exit code: get the variable $? directly after the slk call).

If a file is not stored in the cache then it is only stored on tape. Retrievals from tape will take considerable longer than retrievals from cache.

slk_helpers is_admin_session#

check if the current user has admin permissions in StrongLink

$ slk_helpers is_admin_session

Check if the user is currently logged in as admin to StrongLink. Not useful for normal users. Might be used to check whether a connection to StrongLink is possible.

slk_helpers is_on_tape#

check if resources are stored on tape

$ slk_helpers is_on_tape [-v] [-vv] (RESOURCE_PATH [RESOURCE_PATH [...]]|--resource-id RESOURCE_ID|--search-id SEARCH_ID) [-R]

-R: search recursively in RESOURCE_PATH for files if RESOURCE_PATH is a namespaces/directory
--resource-id: check tape storage status of file with provided RESOURCE_ID instead of RESOURCE_PATH and SEARCH_ID; default: -1
--search-id SEARCH_ID: checking tape storage status of all files represented by provided SEARCH_ID instead of file with RESOURCE_PATH and a RESOURCE_ID; default: -1
-v: verbose mode; print list of files not on tape (== non-matching files) and a summary line
-vv: double verbose mode; print list of checked files incl. their status (is on tape and is not on tape) and a summary line

Note

Please provide either RESOURCE_PATH or --search-id SEARCH_ID or --resource-id RESOURCE_ID.

Checks if the resource RESOURCE_PATH is stored on tape. If a file is stored on tape and in the cache the this command will return true / on tape. Accepts multiple RESOURCE_PATH``s. The user is informed via a text message whether ``RESOURCE_PATH exists. Additionally, the exit code will be 0 if the resource is stored on a tape and 1 if not (exit code: get the variable $? directly after the slk call). When --search-id SEARCH_ID is set or RESOURCE_PATH / --resource-id RESOURCE_ID is a namespaces, more than one file might be checked. If at least one file is not on tape, 1 is return. 0 is only returned when all files are on tape.

If a file is not stored on tape then it is only stored in cache. Based on this command’s output it is not possible to determine whether a file is also stored in the HSM cache or not.

slk_helpers json2hsm#

read metadata from JSON file and attach them to resource in StrongLink

json2hsm [options] <SL-JSON metadata file> <GNS path>

-l, --expect-json-lines: consider the input file to be JSON-lines instead of normal JSON, Default: false
--ignore-non-existing-metadata-fields: if set: if a metadata field of the source metadata record does not exist in StrongLink then this metadata field is skipped. if not set: throw an error and exit as soon a source metadata field does not exist in StrongLink. If this flag is not set but -k/--skip-bad-metadata-sets is set, then metadata records with non-existing metadata fields will be skipped. Default: false
--instant-metadata-record-update: not set: read the whole JSON file and collect all metadata updates => apply all updates in the end; if two metadata records exist for one resource, this will become apparent before any metadata are written; if set: write each metadata record to StrongLink directly after it has be read from the JSON file; if two metadata records exist for one resource, the first metadata record will be written to StrongLink and the duplication will remain undetected until the duplicate record is read from JSON. Default: false
-q, --quiet: print nothing to stdout (e.g. no summary), Default: false
-r FILE, --restart-file FILE: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)
-s SCHEMA[,SCHEMA[...]], --schema SCHEMA[,SCHEMA[...]]: import only metadata fields of listed schemata (comma-separated list without spaces)
-k, --skip-bad-metadata-sets: skip damaged / incomplete metadata sets [default: throw error], Default: false
-v, --verbose: activate verbose mode, Default: false
-m MODE, --write-mode MODE: select write mode for metadata, Default: OVERWRITE, Possible Values: OVERWRITE, KEEP, ERROR, CLEAN (CLEAN: first, delete all metadata from the target schema and, then, write new metadata)

Reads metadata from JSON will and write them to archived files into HSM. Uses relative paths from metadata records plus base path provided by the user to identify target files. See JSON structure for/of metadata import/export for details.

slk_helpers job_exists#

check if job with given ID exists in StrongLink

slk_helpers job_exists JOB_ID

Check if a tape read job (recall job) or verify job with the given ID exists.

slk_helpers job_queue#

print status information of the StrongLink queue

slk_helpers job_queue

-i <INTERPRET_TYPE>, --interpret <INTERPRET_TYPE>: interprete the length of the StrongLink recall job queue; possible values for INTERPRET_TYPE: * RAW, R: same as argument not set * TEXT, T: print short textual interpretation of the queue status => none, short, medium, long, jammed * DETAILS, D: print detailed textual interpretation of the queue status * NUMERIC, N: print a number representing the queue status => 0 (==none), 1 (==short), …, 4 (==jammed)

Prints length or the status of the queue of tape read jobs (recall jobs). The output looks like this:

$ slkh job_queue
total read jobs: 110
active read jobs: 12
queued read jobs: 98

$ slkh job_queue --interpret N
3

$ slkh job_queue --interpret T
long

or like this:

$ slkh job_queue
total read jobs: 4
active read jobs: 4
queued read jobs: 0

$ slkh job_queue --interpret N
0

$ slkh job_queue --interpret T
none

$ slk_helpers job_queue --interpret D
no queue, waiting time in the queue: none

slk_helpers job_report#

Warning

This command is not needed/recommended anymore and will be removed oder deactivated in future releases of slk_helpers. We recommend using the command result_verify_job instead.

extract basic report of a StrongLink job and print it

slk_helpers job_status#

return status of the StrongLink job

slk_helpers job_status JOB_ID

Check the status of a tape read job with the given ID. The status is one of these: ABORTED, QUEUED, PROCESSING, COMPLETED, SUCCESSFUL, FAILED and PAUSED. When the status is QUEUED then the place in the queue is appended in brackets, e.g.: QUEUED (12).

See Job Stati for descriptions of the job stati.

slk_helpers has_no_flag_partial#

check if resources are not flagged as partial files

$ slk_helpers has_no_flag_partial (RESOURCE_PATH [RESOURCE_PATH [...]]|--resource-id RESOURCE_ID|--search-id SEARCH_ID) [-R] [-v|-vv]

-R: search recursively in RESOURCE_PATH for files if RESOURCE_PATH is a namespaces/directory
--resource-id: check file with provided RESOURCE_ID instead of RESOURCE_PATH and SEARCH_ID; default: -1
--search-id SEARCH_ID: check all files represented by provided SEARCH_ID instead of file with RESOURCE_PATH and a RESOURCE_ID; default: -1
-v: single verbose mode; print list of files with flag (== non-matching files) and a summary line
-vv: double verbose mode; print list of checked files incl. their status (has no partial flag and has partial flag) and a summary line

Note

Please provide either RESOURCE_PATH or --search-id SEARCH_ID or --resource-id RESOURCE_ID.

Checks if a resource RESOURCE_PATH is flagged as “partial file” and prints the resource path if this is the case. --invert inverts the command checking mechanism so that all files which are not flagged as “partial file” are printed. Accepts multiple RESOURCE_PATH``s. Additionally, the exit code will be ``0 if at least one match was found and 1 if no match was found (exit code: get the variable $? directly after the slk call).

slk_helpers list_clone_file#

print details on resources; similar than slk list but has additional options and prints more details

$ slk_helpers list_clone_search [--print-resource-ids] [--print-more-timestamps] [print-timestamps-as-seconds-since-1970] [--proceed-on-error] (--read-from-file <LOCAL FILE>|<RESOURCE_PATH> [<RESOURCE_PATH> [<RESOURCE_PATH> [...]]])

--print-resource-ids: print resource ids instead of file paths
--print-more-timestamps: print additional timestamps which are not supported when the results of a search id are printed,
--print-timestamps-as-seconds-since-1970: print timestamps in seconds since 1970
--proceed-on-error: Proceed listing files even if an error arose
--read-from-file <LOCAL FILE>: Read file list from file instead of command line arguments or stdin

Prints information on the provided <RESOURCE_PATH> s. Does not print the content of namespaces / folders but only information on the targeted resource itself.

If --read-from-file <LOCAL FILE> is set, the <RESOURCE_PATH> s are ignored.

The output consists of 10 columns:

col 1: permissions and storage location
col 2: uid / user id
col 3: gid / group id
col 4: size in byte
col 5: mtime of the file (time stamp of “last modification of the file prior to the archival”)
col 6: time stamp of first file version in StrongLink
col 7: stamp of current file version in StrongLink
col 8: time stamp of last StrongLink-internal copy process of this file (e.g. last recall)
col 9: tape id if file is stored on one tape
col 10: full path

slk_helpers list_clone_search#

print details on search results; similar than slk list but has additional options and prints more details

$ slk_helpers list_clone_search [-f] [-d] SEARCH_ID

-f, --only-files: list only search results which are files (same as slk list)
-d, --only-directories, --only-namespaces: print only search results which are namespaces (cannot be printed by slk list)
--print-resource-ids: print resource ids instead of file paths
--count COUNT: print not more than COUNT results (hidden; see note below)
--start START: skip the first START - 1 results (hidden; see note below)

Lists all search results of the search SEARCH_ID. Do print the full path of all search results. If -f and -d are provided at once, the output is the same as when both arguments were not set.

The output consists of 10 columns:

col 1: permissions and storage location
col 2: uid / user id
col 3: gid / group id
col 4: size in byte
col 5: mtime of the file (time stamp of “last modification of the file prior to the archival”)
col 6: time stamp of first file version in StrongLink
col 7: stamp of current file version in StrongLink
col 8: time stamp of last StrongLink-internal copy process of this file (e.g. last recall)
col 9: tape id if file is stored on one tape
col 10: full path

Note

--start and --count refer to these search results before access permissions of the user are applied. Thus, less results than specified by --count might be printed.

Warning

slk_helpers list_search SEARCH_ID collects all search results, first, and, then, prints them. This might take a while if many search results are found. However, we print a warning if this is the case.

slk_helpers list_search#

Warning

slk_helpers list_search is deprecated; please use slk list, slk_helpers list_clone_file or slk_helpers list_clone_search instead

print details on search results

$ slk_helpers list_search [-f] [-d] SEARCH_ID

-f, --only-files: list only search results which are files (same as slk list)
-d, --only-directories, --only-namespaces: print only search results which are namespaces (cannot be printed by slk list)
--count COUNT: print not more than COUNT results (hidden; see note below)
--start START: skip the first START - 1 results (hidden; see note below)

Lists all search results of the search SEARCH_ID. Do print the full path of all search results. If -f and -d are provided at once, the output is the same as when both arguments were not set.

Note

slk_helpers list_search collects all search results independent of whether the user has read permissions or not; --start and --count refer to these search results and not to the search results the user is allowed to see.

Warning

slk_helpers metadata#

prints resource metadata; targets one resource only

$ slk_helpers metadata RESOURCE_PATH

--alternative-output-format: different format to print metadata (each row is: schema.field: value), Default: false

Prints the available metadata of a resource. Corresponds to slk tag – whereas slk tag sets metadata and slk_helpers metadata prints metadata.

slk_helpers mkdir#

create a namespace (directory) in StrongLink

$ slk_helpers mkdir [-R] GNS_PATH

-p, --parents: use the -p to create folders recursively, if the parent folders do not exist; throw no error if folder already exists (like ‘mkdir -p’)
Default: false
-R: use the -R to create folders recursively, if the parent folders do not exist; throw an error if folder already exists

Creates a namespace in an already existing namespace (== create basename GNS_PATH in dirname GNS_PATH). This command works like mkdir on a Linux terminal. Create nested namespaces recursively when -R is set. slk_helpers mkdir -p behaves like mkdir -p on the Linux terminal.

slk_helpers print_rcrs#

print resource storage details

$ slk_helpers print_rcrs (RESOURCE_PATH|--resource-id RESOURCE_ID)

--resource-id: get rcrs of a file with given resource id instead of path

Gets the r**esource **c**ontent **r**ecord**s for a resource path or resource id. Some files which where archived by HPSS were split into two parts which were stored on different tapes. If these files are accessed via StrongLink each file part gets its own checksum. There will be no overall checksum stored for the combined file. Therefore slk_helpers checksum prints no checksums for such files. If you need to verify such split files after retrieval, you can get the size and checksum of each file part via this command and, then, split the file via split -b <SIZE> <FILE>. The command does not necessarily print the file part information in the correct order. The information on the second file part might be printed first.

slk_helpers recall#

Note

Extended test phase of this command. Please report bugs and feature requests to the DKRZ support (support@dkrz.de).

submit recall job

$ slk_helpers recall [<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]|<RESOURCE_ID_1> [<RESOURCE_ID_2> [...]]|<SEARCH_ID>] [(-d|--destination) <RETRIEVAL_DESTINATION>] [--dry-run] [-ns] [-R] [--resource-ids|--search-id] [--suppress-input-info] [-v|-vv]

The command attempts to read from stdin if neither <RESOURCE_PATH_1> ... nor <RESOURCE_ID_1> ... nor <SEARCH_ID> are provided. This means that you can also pipe data into this command.

-d <RETRIEVAL_DESTINATION>, --destination <...>: destination path; only files which do not exist in <destination> will be considered; only considered when individual source files are provided; if source is a namespace, this option is ignored
--dry-run: list which file would be recalled from the tape to cache but do not actually start recall job
-ns: preserve namespace in destination (takes effect only incombination with -dst / --destination)
-R: use the -R flag to recall recursively
--resource-ids: interprets the input as a list of resource ids
--search-id: interprets the input as a search id
--suppress-input-info: suppress user information when input is read from stdin
-v: single verbose mode
-vv: double verbose mode

Starts recall job for the provided resources. Returns recall job ID instantly after job has been submitted. Can recall from a maximum of four tapes at once. If you provide resource ids or a search id, they do not need to be provided directly after --resource-ids or --search-id. Instead, these two parameters are merely switches for the interpretation of the input. They also apply when resources are provided via stdin.

Set -d <RETRIEVAL_DESTINATION> so that only files which are not already in <RETRIEVAL_DESTINATION> are recalled. recall expects the files to be directly in <RETRIEVAL_DESTINATION>. If -ns is set, reconstructs the full paths of the resources in <RETRIEVAL_DESTINATION>.

slk_helpers recall_needed#

Note

Extended test phase of this command. Please report bugs and feature requests to the DKRZ support (support@dkrz.de).

check whether recall job would be submitted if recall was run

$ slk_helpers recall_needed [<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]|<RESOURCE_ID_1> [<RESOURCE_ID_2> [...]]|<SEARCH_ID>] [(-d|--destination) <RETRIEVAL_DESTINATION>] [--dry-run] [-ns] [-R] [--resource-ids|--search-id] [--suppress-input-info] [-v|-vv]

The command attempts to read from stdin if neither <RESOURCE_PATH_1> ... nor <RESOURCE_ID_1> ... nor <SEARCH_ID> are provided. This means that you can also pipe data into this command.

-d <RETRIEVAL_DESTINATION>, --destination <...>: destination path; only files which do not exist in <destination> will be considered; only considered when individual source files are provided; if source is a namespace, this option is ignored
--dry-run: list which file would be recalled from the tape to cache but do not actually start recall job
-ns: preserve namespace in destination (takes effect only incombination with -dst / --destination)
-R: use the -R flag to recall recursively
--resource-ids: interprets the input as a list of resource ids
--search-id: interprets the input as a search id
--suppress-input-info: suppress user information when input is read from stdin
-v: single verbose mode
-vv: double verbose mode

Checks whether a recall job was submitted for the provided resources if slk_helpers recall was run. Behaves slightly different than the recall command which is why this feature was implemented as an extra command instead of an option like --dry-run. Differences of recall_needed to recall are:

returns text (“_Resources need to be recalled._”/”_No resources to recall._”) instead of integer (recall job id)
does not check whether required tapes are available

If you provide resource ids or a search id, they do not need to be provided directly after --resource-ids or --search-id. Instead, these two parameters are merely switches for the interpretation of the input. They also apply when resources are provided via stdin.

Set -d <RETRIEVAL_DESTINATION> so that only files which are not already in <RETRIEVAL_DESTINATION> are recalled. recall_needed expects the files to be directly in <RETRIEVAL_DESTINATION>. If -ns is set, reconstructs the full paths of the resources in <RETRIEVAL_DESTINATION>.

slk_helpers resourcepath#

Warning

Might be deprecated soon. Please use slk_helpers resource_path instead

prints path of resource with provided id; targets one resource only

$ slk_helpers resourcepath RESOURCE_ID

slk_helpers resource_features#

Warning

experimental command; argument names and behaviour not considered as stable

Print various details of resources. The user can select which detail is to be printed. Each line contains all requested details of one resources.

slk_helpers resource_features \
    (RESOURCE_PATH [RESOURCE_PATH [...]]|--resource-id[s] RESOURCE_ID [RESOURCE_ID [...]]|--search-id SEARCH_ID) \
    [--features FEATURE1[,FEATURES2[...]]|--which-features|--full|--full-with-tapes] \
    [--include-namespaces|--only-namespaces] [--non-recursive] \
    [-v|-vv] [--separator1 <SEPARATOR1>] [--separator2 <SEPARATOR2>]

select input resources

--include-namespaces, --ins: print namespaces in addition to files -nR, --non-recursive: collect resources recursively --only-namespaces, --ons: print only namespaces and no files --resource-id, --resource-ids: consider input as resource ids instead of resource paths --search-id: consider input as search id instead of resource path

select which features to print:

--which-features: print list of possible features --features, -f: comma separated list of features to print --full: print all resource features except for detailed tape information which would require additional API calls; please run with --full-with-tapes to include tape information --full-with-tapes: print all resource features; also those which require additional API calls

control the output formatting:

--separator1, -s1: separator of results on the top level (column separator). Default: “,” --separator2, -s2: separator of results on the second/nested level (subcolumn separator). Default: “;”

select which additional details should be printed (mostly to stderr):

--suppress-input-info: suppress user information when input is read from command line -v: single verbose mode -vv: double verbose mode

Prints resources features/details as selected by --features. --full prints all available details except for detailed tape storage details which require additional API requests. This causes less traffic for StrongLink and improves the speed of this command. --full-with-tapes prints all available details. An overview over all available features is printed when --which-features is set. In future versions we might provide nicer feature names.

Feature names starting with rcr are related to individual physical file parts. HPSS split a few files into multiple parts. A resource has an official size (size) and modification date (modified_iso). Additionally, it has a list of individual sizes (rcr_sizes) or modification dates (rcr_modified_iso) of the individual file parts. The separator defined by --separator2 separate these list items. Even in the case of non-split files, the values of features rcr_<X> and <X> might differ.

By default, the command prints features of files but does not print features of namespaces. If the features of namespaces should printed in addition, then please set -ins / include-namespaces. If only the features of namespaces and not of files should be printed, then please set -ons / --online-namespaces.

By default, the command automatically evaluates the content of namespaces recursively. This is different to other commands which require -R / --recursive to be set for recursive evaluation of the content of namespaces. If no recursive evaluation is required, please set -nR / --non-recursive.

Example: print file paths and sizes of all files in /arch/bm0146/k204221/iow

$ slk_helpers resource_features /arch/bm0146/k204221/iow --features path,size
id,path,size
49058705519,/arch/bm0146/k204221/iow/iow_data_006.tar,8364490752
49058705518,/arch/bm0146/k204221/iow/iow_data_005.tar,20478689280
49058705517,/arch/bm0146/k204221/iow/iow_data_004.tar,20715667456
49058705497,/arch/bm0146/k204221/iow/INDEX.txt,1268945
Resources: 4; Resources with Errors: 0

Example: print file path and size of /arch/bm0146/k204221/iow and of all files in it

$ slk_helpers resource_features /arch/bm0146/k204221/iow --include-namespaces --features path,size
id,path,size
49058658013,/arch/bm0146/k204221/iow,0
49058705519,/arch/bm0146/k204221/iow/iow_data_006.tar,8364490752
49058705518,/arch/bm0146/k204221/iow/iow_data_005.tar,20478689280
49058705517,/arch/bm0146/k204221/iow/iow_data_004.tar,20715667456
49058705497,/arch/bm0146/k204221/iow/INDEX.txt,1268945
Resources: 5; Resources with Errors: 0

Example: same as first example but with a different column separator

$ slk_helpers resource_features /arch/bm0146/k204221/iow --features path,size --separator1 "|"
id|path|size
49058705519|/arch/bm0146/k204221/iow/iow_data_006.tar|8364490752
49058705518|/arch/bm0146/k204221/iow/iow_data_005.tar|20478689280
49058705517|/arch/bm0146/k204221/iow/iow_data_004.tar|20715667456
49058705497|/arch/bm0146/k204221/iow/INDEX.txt|1268945
Resources: 4; Resources with Errors: 0

Example: barcodes and sizes of fileparts of split and non-split files (compare example in slk_helpers resource_tapes)

$ slk_helpers resource_features /arch/ab1234/regular_file_01.nc /arch/ab1234/regular_file_02.nc /arch/ab1234/split_file.nc --features path,rcr_tape_barcodes,id,path,rcr_tape_barcodes,rcr_sizes,size
49000000001,/arch/ab1234/regular_file_01.nc,B09216L5,1268945,1268945
49000000002,/arch/ab1234/regular_file_02.nc,C25543L6,20942159872,20942159872
49000000003,/arch/ab1234/split_file.nc,C44911L6;C44904L6,9063669760;37748736000,46812405760

slk_helpers resource_id#

prints id(s) of resource(s) with provided path(s)

$ slk_helpers resource_id [<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]|--read-from-file <FILE_WITH_RESOURCE_PATHS>] [--proceed-on-error]

The command attempts to read from stdin if neither <RESOURCE_PATH_1> ... nor --read-from-file <FILE_WITH_RESOURCE_PATHS> are provided. This means that you can also pipe data into this command.

--proceed-on-error: Proceed listing files even if an error arose
--read-from-file <FILE_WITH_RESOURCE_PATHS>: Read file list from file instead of command line arguments or stdin

Prints the resource path and the corresponding resource id separated by : ``. Prints one pair per line. Similar to ``exists

slk_helpers resource_path#

Prints path of resource with provided id. Targets one resource only. Please use slk_helpers resource_features --features path --resource-ids when you wish to print the paths of multiple resources based on their resource ids.

$ slk_helpers resource_path RESOURCE_ID

Gets path for a resource id and returns it.

slk_helpers resource_permissions#

Prints resource permissions. Targets one resource only. Please use slk_helpers resource_features --features path,mode_rxw when you wish to print the permissions/mode of multiple resources.

$ slk_helpers resource_permissions (RESOURCE_PATH|--resource-id RESOURCE_ID)

--resource-id: get type of a file with given resource id instead of path; default: -1
--as-octal-number: Do not return the permissions as combination of x , w, r and - but as three digit octal number.

Gets permissions for a resource path or resource id as combination of x , w, r and -

slk_helpers resource_tape#

Print tape on which resource is stored on. Targets one resource only. Please use slk_helpers resource_tapes or slk_helpers resource_features --features path,rcr_tape_barcodes when you wish to print the tape barcodes of multiple resources.

$ slk_helpers resource_tape [--json] [--print-path] [--quiet|-q] (<RESOURCE_PATH>|--resource-id <RESOURCE_ID>)

--json: print output in JSON
--print-path: include the path of the file in the output
--quiet, -q: no summary line is printed
--resource-id: get type of a file with given resource id instead of path; default: -1

Returns on which tape(s) the provided file is stored on. Does not process more than one file as input and does not process the content of namespaces / folders recursively.

slk_helpers resource_tapes#

Prints barcodes of tapes on which one or multiple resources are stored on. Per line, information for one resource is printed. Simplified version of resource_tape made to target more than one resource.

$ slk_helpers resource_tapes [--print-sizes] [--proceed-on-error] (<RESOURCE_PATH>|--resource-id <RESOURCE_ID>|--read-from-file <FILE_WITH_RES_PATHS>|--resource-id --read-from-file <FILE_WITH_RES_IDS>)

--print-sizes: print sizes in addition to tape barcodes. Default: false
--proceed-on-error: Proceed listing files even if an error arose. Default: false
--read-from-file: Read file list from file instead of command line arguments or stdin. Default: <empty string>
--resource-ids: Get tape on which a file with given resource id is stored on (instead of path).

The command resource_tape was introduced much earlier than this command. However, its output format is not suited for printing out information of multiple resources. The output format of all slk_helpers commands should be downward compatible and we did not want to introdue an additional arguments to resource_tape. Therefore, this simplified command was introduced.

If a file was stored with HPSS and was split amongst multiple tapes, the command prints the tape barcodes as comma-separated list. When --print-size is set, the sizes of all file parts followed by the overall size are printed as well. The barcodes, partial sizes and overall size are separated by semicolon.

Example:

$ slk_helpers resource_tapes /arch/ab1234/regular_file_01.nc /arch/ab1234/regular_file_02.nc /arch/ab1234/split_file.nc --print-sizes
/arch/ab1234/regular_file_01.nc: B09216L5
/arch/ab1234/regular_file_02.nc : C25543L6
/arch/ab1234/split_file.nc: C44911L6,C44904L6

$ slk_helpers resource_tapes /arch/ab1234/regular_file_01.nc /arch/ab1234/regular_file_02.nc /arch/ab1234/split_file.nc --print-sizes
/arch/ab1234/regular_file_01.nc: B09216L5; 1268945; 1268945
/arch/ab1234/regular_file_02.nc : C25543L6; 20942159872; 20942159872
/arch/ab1234/split_file.nc: C44911L6,C44904L6; 9063669760,37748736000; 46812405760

slk_helpers resource_type#

prints type of resource (file or namespace); targets one resource only

$ slk_helpers resource_type (RESOURCE_PATH|--resource-id RESOURCE_ID)

--resource-id: get type of a file with given resource id instead of path; default: -1

Gets the resource type (FILE or NAMESPACE) for a resource path or resource id

slk_helpers result_verify_job#

return the result of a finished verify job

$ slk_helpers result_verify_job [--header|--sources|--number-errors|--number-sources] <job_id>

--header: print header of the report instead of errors; Default: false
--json: print output as JSON; full output
--json-no-source: print output as JSON; drop source information; useful when job targeted 50000 files”,
--number-errors: print number of errors; Default: false
--number-sources: print number of source resources; note: if one resource was trageted, it might be one file or namespace; if the number of targeted resources is larger than one, all of them were files
--quick: disable additional file verification checks performed for the verify job’s results; not recommended; might reduce run time
--sources: print the sources (sources resources, source namespace)
-v, --verbose: verbosity level 1
-vv, --double-verbose, --verbose-verbose: verbosity level 2

Print verification errors collected by the a verify job with the id job_id. “Verification” means that the target size and the actual size of each targeted file are compared. Mismatches between these two sizes cause a verification error. The full report of a verification job, which can be extracet via slk_helpers job_report <job_id>, might contain additional warnings and errors, which are no relevant for the user.

This command performs some additional checks for the verified files since slk_helpers version 1.13.1 because a few rare issues are not detected by verify jobs. These rare issues arose less than 50 times for more than 20 mio archived files. When more than 5000 are targeted by a verify job, this new feature might considerably increase the run time of result_verify_job. The argument --quick can be set to deactivate this feature and reduce runtime. We discourage the usage of this argument.

slk_helpers retrieve#

Note

Extended test phase of this command. Please report bugs and feature requests to the DKRZ support (support@dkrz.de).

retrieve provided resources if they are cached but does not automatically start recall from tape

$ slk_helpers retrieve ((-d|--destination) <RETRIEVAL_DESTINATION>) [<RESOURCE_PATH_1> [<RESOURCE_PATH_2> [...]]|<RESOURCE_ID_1> [<RESOURCE_ID_2> [...]]|<SEARCH_ID>] [--dry-run] [-ns] [-R] [--resource-ids|--search-id] [--suppress-input-info] [-v|-vv] [(--slurm|--run-as-slurm-job-with-account) <ACCOUNT>]

The command attempts to read from stdin if neither <RESOURCE_PATH_1> ... nor <RESOURCE_ID_1> ... nor <SEARCH_ID> are provided. This means that you can also pipe data into this command.

--dry-run: list which file would be recalled from the tape to cache but do not actually start recall job
--ignore-existing: skip if file exists (even when it is a different file)
--json-batch: prints a JSON summary to the terminal muting all other output to stdout
--json-to-file: prints a JSON summary to a file (if you do not want to capture it together with verbose information)
--print-progress: prints the progress of this command to stderr (if you do not want to capture it together with JSON or verbose information)
--resource-ids: interprets the input as a list of resource ids
--slurm <ACCOUNT>, --run-as-slurm-job-with-account <...>: generate a SLURM job script, which will retrieve requested files (no automatic recall!) and submit it directly (only create it if --dry-run is set)
--search-id: interprets the input as a search id
--stop-on-failed-retrieval: stop immediately when one file cannot be retrieved
--suppress-input-info: suppress user information when input is read from stdin
--write-envisaged-to-file: write files, which could be retrieved, to provided file (failed files, which should be retrieved, are ignored)
--write-missing-to-file: write files, which currently cannot / could not be retrieve, to provided file
-d <RETRIEVAL_DESTINATION>, --destination <...>: destination path; only files which do not exist in <destination> will be considered; only considered when individual source files are provided; if source is a namespace, this option is ignored; requiered
-f, --force-overwrite: force overwrite of all existing files
-ns: preserve namespace in destination (takes effect only incombination with -dst / --destination)
-R: use the -R flag to recall recursively
-v: single verbose mode
-vv: double verbose mode

Starts retrieval of the provided resources to the provided destination for all resources which are in the cache. Does not start recalls for resources which are not in the cache. If you provide resource ids or a search id, they do not need to be provided directly after --resource-ids or --search-id. Instead, these two parameters are merely switches for the interpretation of the input. They also apply when resources are provided via stdin.

Retrieves only files which are not already in <RETRIEVAL_DESTINATION>. retrieve expects the files to be directly in <RETRIEVAL_DESTINATION>. If -ns is set, reconstructs the full paths of the resources in <RETRIEVAL_DESTINATION>.

Usage of --run-as-slurm-job-with-account is described here.

slk_helpers search_immediately#

start search and return search id immediately

$ slk_helpers search_immediately "RQL Search Query"
$ slk_helpers search_immediately 'RQL Search Query'

This command will start a background search, exits immediately after the search has been and returns the corresponding search id. The search query has to be specified by a query a query language that was designed by StrongLink. The query language is described on the StrongLink query language page and in the StrongLink Command Line Interface Guide from page 6 onwards.

Note

Operators in queries start with a $. If a query is delimited by " then the $ has to be escaped by a leading \ (\$OPERATOR). Otherwise, the operator is interpreted as environment variable by the shell. Alternatively, use ' as delimiter.

slk_helpers search_incomplete#

check if submitted search has not completed yet

Warning

This command is work in progress and might be changed in future.

$ slk_helpers search_incomplete <SEARCH_ID>

Prints out whether the search, to which the SEARCH_ID points, is incomplete (== still running) or not (== finished). A complete search might be successful or failed (see slk_helpers search_successful).

slk_helpers search_limited#

Warning

This command will be deprecated soon. Please use slk search or, in special situations, slk_helpers search_immediately instead.

submit a search query

$ slk_helpers search_limited "RQL Search Query"
$ slk_helpers search_limited 'RQL Search Query'

This command will conduct a background search for files that match the specific query specified using a query language syntax that was designed by StrongLink. The query language is described on the StrongLink query language page and in the StrongLink Command Line Interface Guide from page 6 onwards.

Note

A search result ID (search_id) will be returned if 1000 or less results were found. If more results were found, an error and no search ID will be printed. 1000 refers to the total number of results of which some might not be visible to the user. The search ID can be used to list and retrieve files from the archive (see below).

One might need the user or group ids of respective users/groups to search files belonging to them. These ids are obtained as follows.

Get user id:

# get your user id
$ id -u

# get the id of any user
$ id USER_NAME -u

# get the id of any user
$ getent passwd USER_NAME
#  OR
$ getent passwd USER_NAME | awk -F: '{ print $3 }'

# get user name from user id
$ getent passwd USER_ID | awk -F: '{ print $1 }'

Get group id:

# get group ID from group name
$ getent group GROUP_NAME
#  OR
$ getent group GROUP_NAME | awk -F: '{ print $3 }'

# get group name from group id
$ getent group GROUP_ID | awk -F: '{ print $1 }'

# get groups and their ids of all groups of which member you are
$ id

Please see our documentation of specific Usage Examples (slk Usage Examples) and the StrongLink Command Line Interface Guide for exemplary calls of slk search. These can be 1:1 used with slk_helpers search_limited.

Note

slk_helpers search_limited counts all files and namespaces that match the search query and that the current user is allowed to see/read. slk list lists only the respective files. Therefore, slk_helpers search_limited might print an error that more than 1000 resources were found although there are less than 1000 matches of which the user has read permissions. Moreover, different users might get different output of slk list for the same search id.

slk_helpers search_status#

return status of search

$ slk_helpers search_status <SEARCH_ID>

Prints the status of a search.

slk_helpers search_successful#

check if search was successful

Warning

This command is work in progress and might be changed in future.

$ slk_helpers search_successful <SEARCH_ID>

Prints out whether the search, to which the SEARCH_ID points, was successful or not. A not-successful search might have failed or be incompleted (see slk_helpers search_incomplete).

slk_helpers searchid_exists#

check if search id exists

$ slk_helpers searchid_exists <SEARCH_ID>

Prints out whether the provided search ID exists or not.

slk_helpers session#

check if current slk session is valid

$ slk_helpers session

Prints until when the current slk session is valid.

slk_helpers size#

print size of resource; targets one resource only

$ slk_helpers size (RESOURCE_PATH|--resource-id RESOURCE_ID) [-R|--recursive] [--pad-spaces-left WIDTH] [-v|-vv]

--pad-spaces-left <width> pad spaces on the left of the printed size so that total width (spaces + number) is width; default: -1 (no padding)
-R, --recursive Calculate folder size by summing sizes of contained files recursively
--resource-id: get size of a file with given resource id instead of path; default: -1
-v single verbose mode: print sizes of all namespaces recursively
-vv double verbose mode: print sizes of all resources recursively

Returns file size in byte. If a namespace / directory is target and -R / --recursive is not set, 0 is returned. If a namespace / directory is target and -R / --recursive is set, the size is calculated recursively. If the resource does not exist, an error and exit code 1 are return. All other errors cause an exit code of 2.

slk_helpers start_watchers#

Submit recall and/or retrieve watcher scripts as SLURM jobs to obtain files requested by slk_helpers init_watchers or slk_helpers gfbt -wf1 <dst>. slk_helpers init_watchers or slk_helpers gfbt -wf1 has to be run in advance. Us slk_helpers stop_watchers to end the started watchers manually.

slk_helpers start_watchers <ACCOUNT> [--only-recall-watcher|--only-retrieve-watcher] [--max-iterations-recall <a number>] [--max-iterations-retrieve <a number>]

ACCOUNT: compute time account/project at DKRZ for SLURM job submission
--max-iterations-recall: the recall watcher is a SLURM job script which will run repeatedly and recall files; it will stop if the maximum number of iterations is reached; Default: 1008
--max-iterations-retrieve: the retrieval watcher is a SLURM job script which will run repeatedly and retrieve cached files; it will stop if the maximum number of iterations is reached; Default: 1008
--only-recall-watcher, --recall-watcher-only, --only-recall, --recall-only: if this flag is set, only the recall watcher is started; default: start recall and retrieve watchers
--only-retrieve-watcher, --retrieve-watcher-only, –only-retrieve,`` --retrieve-only: if this flag is set, only the retrieve watcher is started; default: start recall and retrieve watchers

Requires slk_helpers init_watchers to be run in advance in the folder where slk_helpers start_watchers is run. slk_helpers start_watchers starts “recall_watcher” and “retrieve_watcher” scripts which are scribed in the retrieval documentation and here in detail. These scripts are meant to recall and retrieve larger numbers of files for you in an efficient way. Usage examples are given on the linked pages of the documentation.

slk_helpers stop_watchers#

Stop recall and/or retrieve watcher(s) which were started from the current directory. There is no difference, whether the scripts were started by the start_*_watcher.sh scripts or by slk_helpers start_watchers.

slk_helpers stop_watchers [--only-recall-watcher|--only-retrieve-watcher] [-v]

--only-recall-watcher, --recall-watcher-only, --only-recall, --recall-only: if this flag is set, only the recall watcher is stoped; default: stop recall and retrieve watchers
--only-retrieve-watcher, --retrieve-watcher-only, –only-retrieve,`` --retrieve-only: if this flag is set, only the retrieve watcher is stoped; default: stop recall and retrieve watchers
--verbose, -v: verbose mod

slk_helpers stop_watchers stops “recall_watcher” and “retrieve_watcher” scripts which are scribed in the retrieval documentation and here in detail. These scripts are meant to recall and retrieve larger numbers of files for you in an efficient way. Usage examples are given on the linked pages of the documentation.

slk_helpers submit_verify_job#

submits a verify job and return job id

$ slk_helpers submit_verify_job [-v] RESOURCE_PATH [RESOURCE_PATH [...]] [(-R|--recursive)] [--save-mode]
$ slk_helpers submit_verify_job [-v] --resource-ids RESOURCE_ID [RESOURCE_ID [...]] [(-R|--recursive)] [--save-mode]
$ slk_helpers submit_verify_job [-v] --search-id SEARCH_ID [--save-mode] [--resume-on-page <n>]
$ slk_helpers submit_verify_job [-v] --search-query 'SEARCH_QUERY' [--save-mode]
# currently, only for admin users:
$ slk_helpers submit_verify_job [-v] (-i|--infile|--input) JSON_VERIFY_JOB_FILE

-i, --infile, --input JSON_VERIFY_JOB_FILE: a verify job can be described by a JSON expression; this JSON can be provided as file to this command
-j: print output as JSON
-R, --recursive: if a resource path or resource id points to a namespace, consider all resources in this namespace recursively
--resource-ids RESOURCE_ID [RESOURCE_ID [...]]: target resources by their resource ids instead of their resource paths
--resume-on-page <n>: resume the command and start submitting jobs starting with search result 1000 * n; internally, 1000 search results are on one ‘page’ and fetched by one request => therefore 1000 * n; you do not necessarily have read permissions for 1000 files per page
--save-mode: save mode suggested to be used in times of many timeouts; please do not regularly use this parameter; start one verify job per page of search results instead of one verify job for 50 pages of search results
--search-id SEARCH_ID: target resources which were found by this search
--search-query 'SEARCH_QUERY': target resources which will be found by a search defined by this search query
-v: verbose mode; print information on what is currently done recommended

Starts a verify job for the selected files. Files, for which the current user does not have read permissions, are automatically ignored. No error message or warning is printed if files are ignored. The result of the verify job can currently be fetched as verify report via slk_helpers result_verify_job. We strongly suggest to read Reference: StrongLink verify reports prior to evaluating a verify report the first time. The checked files are listed in the header of the verify report.

One verify job is limited to 50000 resources because the run time of the job considerably increases for higher number of resources. If the verification of more than 50000 files is requested, multiple verify jobs are submitted. All job ids are printed out – one job id per line. A verify job targeting 50000 approximately runs 6 minutes.

Verify jobs are submitted to the same StrongLink-internal queue to which also retrieval/recall jobs are submitted. Thus, if 100 retrieval/recall jobs wait in the queue then new verify jobs will line up in the end and need to wait a long time. No new verify jobs can be submitted by non-admin users if already two or more jobs run for their user name. If one submit_verify_job command call wants to submit multiple verify jobs, which number does exceed the limit of two jobs per user, the command is allowed to do so if at least one job slot is empty. Thus, more than two verify jobs might be running in certain situations.

Note

The option -i / --infile / --input is currently deactivated for normal users because via this parameter a few options could be set for a verify job which might be harmful for the speed or stability of the StrongLink system. When it will be possible in future to limit the usage of these options, we might release this parameter for general usage.

slk_helpers tape_barcode#

return barcode of tape with provided id

$ slk_helpers tape_barcode TAPE_ID

Returns the barcode of a tape with tape id TAPE_ID if it exists.

slk_helpers tape_exists#

check if tape exists

$ slk_helpers tape_exists (TAPE_ID|--tape-barcode TAPE_BARCODE)

Returns whether the tape with tape id TAPE_ID or tape barcode TAPE_BARCODE exists in the tape library or not.

slk_helpers tape_id#

return id of tape with provided barcode

$ slk_helpers tape_id TAPE_BARCODE

Returns the ID of a tape with tape barcode TAPE_BARCODE if it exists.

slk_helpers tape_library#

print name of tape library in which a tape is located in

$ slk_helpers tape_library (TAPE_ID|--tape-barcode TAPE_BARCODE)

Returns the name of the tape library in which the tape is stored in.

slk_helpers tape_status#

print status of a tape

$ slk_helpers tape_status [--details] (TAPE_ID|--tape-barcode TAPE_BARCODE)

--details: print a more detailled description of the retrieval status (different states of AVAILABLE are possible)

Prints the status of a tape with tape id TAPE_ID or tape barcode TAPE_BARCODE for retrievals: AVAILABLE, BLOCKED or ERRORSTATE. The meaning of the statis is given in Tape Stati below. Please contact support@dkrz.de if you encounter a tape with ERRORSTATE.

slk_helpers tnsr#

please see total_number_search_results

slk_helpers total_number_search_results#

print number of results found by provided search

$ slk_helpers total_number_search_results [-q|--quiet] <SEARCH_ID>

-q, --quiet: print no warnings

Prints out the total number of search results. All search results independent of user permissions are counted. E.g. if a search found 10 results but the current user can only see 1 result, then slk list will print out this 1 result whereas slk_helpers tnsr will print 10.

Stati#

Tape Stati#

AVAILABLE: tape is fully available
BLOCKED: currently data is written onto the tape; recalls/retrievals targeting this tape will fail until the write process is finished; please wait a few hours
ERRORSTATE: tape is in an error state which needs to be reset; currently, no recall/retrieval from this tape is possible; please contact support@dkrz.de
UNAVAILABLE: only used in group_files_by_tape for files without storage information; no recall/retrieval possible
UNCLEAR: only used in group_files_by_tape for files stored on multiple tapes each; status of these tapes was not checked

Job Stati#

BLOCKED: job is blocked by another running job (please retry later; e.g. 60 min)
QUEUED: job is queued in StrongLink
PROCESSING: job is being processed (= files are read from tape)
PAUSED: job has been paused by a StrongLink admin; there is an issue with your job; please contact support@dkrz.de (data protection: StrongLink admins cannot view the job owner)
COMPLETED: job has been completed; was replaced by SUCCESSFUL and FAILED; might be returned in rare situations
SUCCESSFUL: job has been completed and was successful
FAILED: job has been completed and was not successful
ABORTED: job has been aborted by a StrongLink admin; there has been an issue with your job; please contact support@dkrz.de (data protection: StrongLink admins cannot view the job owner)
STOPPED: job has been stopped (very rare)
WAITING: job is waiting for something (very rare)
OTHER: other not clearly defined state (very rare)

Exit codes#

command	task	exit code
bad input command	always (redirected to slk help)	2
general (not help, version and session) (not help, version and session)	session expired	2
	issue related to config file	2
	conntection timeout or connection could not be established	3
help	always	0
change (admins only)	change request was performed successfully	0
change (admins only)	any error except connection issue	2
checksum	resource exists and has checksum	0
	resource not found	1
	requested checksum not available	1
	resource path and resource ID provided	2
	any other error except connection issue	2
exists	resource exists	0
	resource does not exist	1
	any other error except connection issue	2
export_metadata	successful export	0
export_metadata	any error except connection issue	2
gen_file_query	query successfully generated	0
gen_file_query	any other error except connection issue	2
gen_search_query	query successfully generated	0
	a field name or a schema name does not exist or a value cannot be converted	1
	any other error except connection issue	2
gfbt (same as group_files_by_tape)	files successfully grouped	0
gfbt (same as group_files_by_tape)	any error except connection issue	2
group_files_by_tape	files successfully grouped	0
group_files_by_tape	any error except connection issue	2
hostname	hostname is set and is as printed	0
hostname	any error except connection issue	2
hsm2json	metadata exported successfully	0
hsm2json	any error except connection issue	2
init_watchers (initialize_watchers)	successful initialization and generation of watcher files	0
init_watchers (initialize_watchers)	any error except connection issue	2
initialize_watchers	successful initialization and generation of watcher files	0
initialize_watchers	any error except connection issue	2
iscached	resource exists and is cached	0
	resources exist and all of them are cached	0
	resource exists and is not cached	1
	resources exist and at least one is not cached	1
	resource(s) do(es) not exist	2
	resource path and resource ID provided	2
	resource path and search ID provided	2
	resource resource ID and resource ID provided	2
	any other error except connection issue	2
is_admin_session	login token exists and belongs an admin user	0
	login token exists but belongs a normal user	1
	no login token	2
	session expired	2
	any error except connection issue	2
is_on_tape	resource exists and is on tape	0
	resources exist and all of them are on tape	0
	resource exists and is not on tape	1
	resources exist and at least one is not on tape	1
	resource(s) do(es) not exist	2
	resource path and resource ID provided	2
	resource path and search ID provided	2
	resource resource ID and resource ID provided	2
	any other error except connection issue	2
json2hsm	metadata imported successfully	0
json2hsm	any error except connection issue	2
job_exists	job exists	0
	job does not exist	1
	any error except connection issue	2
job_queue	number of jobs printed successfully	0
job_queue	any error except connection issue	2
job_report	report of the job printed successfully	0
	job has failed or was aborted	1
	job is not finished yet	1
	any error except connection issue	2
job_status	status of the job printed successfully	0
	job has failed or was aborted	1
	any error except connection issue	2
has_no_flag_partial	no file is flagged as “partial file”	0
	at least one file is flagged as “partial file”	1
	any error except connection issue	2
list_clone_file	resounrce exists	0
	resource does not exist	1
	any error except connection issue	2
list_clone_search	search id correct and search results to print	0
	search id correct but no results to print	1
	search id does not exist	2
	any error except connection issue	2
list_search	search id correct and search results to print	0
	search id correct but no results to print	1
	search id does not exist	2
	any error except connection issue	2
metadata	resource exists and metadata available	0
	resource does not exist	1
	any error except connection issue	2
mkdir	namespace successfully created	0
	namespace with same name already exists	1
	any other error except connection issue	2
multi_touch	all targeted files could be touched	0
multi_touch	any other error except connection issue	2
print_rcrs	sizes and checksums of all file parts printed	0
	file has `0` byte and no storage info	0
	one or more checksums not available	1
	file `> 0` byte but has no storage info	2
	resource exists but is a namespace (folder)	2
	resource does not exist	2
	invalid combination of input parameters	2
	any other error except connection issue	2
recall	recall job started successfully	0
	at least one targeted tape is unavailable	1
	no resources to recall (e.g. all are cached)	1
	any error except connection issue	2
recall_needed	recall job started successfully	0
	at least one targeted tape is unavailable	1
	no resources to recall (e.g. all are cached)	1
	any error except connection issue	2
resourcepath	resource with given ID exists	0
	resource with given ID does not exist	1
	any error except connection issue	2
resource_features	all resources exist and requested features were available for all resources	0
	one or more resources do not exist	1
	one or more features of one or more resources were not available	1
	any error except connection issue	2
resource_id	resource(s) with given path exist(s)	0
	one or more resources do not exist	1
	any error except connection issue	2
resource_json	resource(s) with given path/id exist(s) and JSON metadata could be obtained	0
	one or more resources do not exist	1
	JSON metadata of one or more resources could be obtained	1
	any error except connection issue	2
resource_path	resource with given ID exists	0
	resource with given ID does not exist	1
	any error except connection issue	2
resource_permissions	resource with given ID exists	0
	resource with given ID or path does not exist	1
	resource path and resource ID provided	2
	any error except connection issue	2
resource_tape	resource with given ID exists and is a file	0
	resource with given ID or path does not exist	1
	resource is a namespace	1
	resource path AND resource ID provided	2
	any error except connection issue	2
resource_tapes (same exist codes as resource_tape)	resource with given ID exists and is a file	0
	resource with given ID or path does not exist	1
	resource is a namespace	1
	resource path AND resource ID provided	2
	any error except connection issue	2
resource_type	resource with given ID exists	0
	resource with given ID or path does not exist	1
	resource path AND resource ID provided	2
	any error except connection issue	2
result_verify_job	job report successfully fetched and printed	0
	job id does not exist, job not finished or job not-successfully finished	1
	any error except connection issue	2
retrieve	resources succe retrieved	0
	one or more resources could not be retrieved because they were not cached	1
	any error except connection issue (e.g. output file could not be written; un- expected retrieval error arose)	2
search_immediately	search successfully submitted	0
search_immediately	any error except connection issue	2
search_incomplete	search is incomplete (== search still running)	0
	search is complete (== search has finished)	1
	search id does not exist	2
	any error except connection issue	2
search_limited	search successfully performed	0
search_limited	any error except connection issue	2
search_status	search was successful	0
	search failed or is still running (incomplete)	1
	search id does not exist	2
	any error except connection issue	2
search_successful	search was successful	0
	search failed or is still running (incomplete)	1
	search id does not exist	2
	any error except connection issue	2
searchid_exists	search id exists	0
	search id does not exist	1
	any error except connection issue	2
session	login token exists and is not expired	0
	no login token	1
	session expired	1
	any error except connection issue	2
size	resource exists (file or namespace)	0
	resource with given ID or path does not exist	1
	resource path and resource ID provided	2
	any other error except connection issue	2
start_watchers	requested watcher(s) started successfully	0
start_watchers	any error except connection issue	2
stop_watchers	requested watcher(s) stopped successfully	0
stop_watchers	any error except connection issue	2
submit_verify_job	verify job successfully submitted	0
	user reached allowed job limited of two jobs	1
	wrong combination of input parameters	2
	any other error except connection issue	2
tape_barcode	tape exists	0
	tape does not exist	1
	any error except connection issue	2
tape_exists	tape exists	0
	tape does not exist	1
	tape barcode AND tape ID provided	2
	any error except connection issue	2
tape_id	tape exists	0
	tape does not exist	1
	any error except connection issue	2
tape_library	tape exists	0
	tape does not exist	1
	tape barcode AND tape ID provided	2
	any error except connection issue	2
tape_status	tape is available for reading	0
	tape is blocked / currently no reading	1
	tape barcode AND tape ID provided	2
	any error except connection issue	2
tnsr (same as total_number_search_results)	search was successful	0
	search failed or is still running (incomplete)	1
	search id does not exist	1
	any error except connection issue	2
total_number_search_results	search was successful	0
	search failed or is still running (incomplete)	1
	search id does not exist	1
	any error except connection issue	2
touch (deactivated)	targeted resource (cached file) was touched	0
	targeted resource is not cached	1
	targeted resource is no file	1
	any error except connection issue	2
version	always	0

Technical background of selected commands#

slk_helpers gen_file_query#

The search query is generated as follows: The input file list is taken and each path is separated into filename (like basename PATH) and directory (like dirname PATH). All filenames in the same directory are grouped and a regular expression is generated which finds exactly these files. Then this expression is linked via an and to the respective directory in which these files are located. This is done for each distinct directory in the input. These search expressions are linked via an or at the top level.

It is checked whether a directory exists in StrongLink. An error is thrown if it doesn’t exist.

The resulting search query can be optimized in length by the user. We do not do this in the slk_helpers because it would add considerable complexity to the code.

Major Changes#

1.16.5 (2025-06-25)#

iscached did not properly print the caching state of split files; fixed

1.16.4 (2025-06-23)#

fixed issue with argument -d of cmd initialize_watchers
new command init_watchers which does the same as initialize_watchers

1.16.3 (2025-06-05)#

fixed bugs in command change (admins only)
improved output of resource_features and resource_json when resource does not exist

1.16.2 (2025-05-19)#

resource_features: * recursive namespaces processing is default and -R / --recursive is removed * --full-basic renamed to --full * --full-extended rename to --full-with-tapes
resource_json: * new namespace flags --include-namespaces and --only-namespaces) * recursive namespaces processing is default and -R / --recursive is removed * no error when resource not exists anymore

1.16.1 (2025-05-16)#

resource_features: * new namespace flags --include-namespaces and --only-namespaces) * no error when resource not exists anymore
bugfixes

1.16.0 (2025-05-15)#

new command initialize_watchers which is a simplified version of group_files_by_tape / gfbt which is only made for generating input files for the watchers and which reads files from stdin (pipe input into command possible!)

1.15.1 (2025-05-15)#

command resource_features prints list of allowed features when --which-features is set

1.15.0 (2025-05-15)#

new commands:
- resource_features: print selection of all available attributes/features of provided resources; one resource per line
- resource_json: print raw JSON output of resource metadata
command stop_watchers does not throw error anymore when not watcher is running (#83)
command start_watchers tries to start retrieve watcher even if start of recall watcher failed

1.14.4 (2025-05-07)#

fixed bug in slk_helpers start_watchers

1.14.3 (2025-05-06)#

command exists got new argument --resource-id: provided resource considered as resource id instead of resource path

1.14.2 (2025-05-06)#

fixed bug in representation of POSIX permissions by list_clone_* commands
command iscached was extended:
- accepts multiple resource ids as input when --resource-ids is set
- resources (path or id) can be piped into the command (e.g.: cat file_list.txt | slk_helpers iscached ...)
- prints resource ids instead of resource paths in the verbose output when --print-resource-ids is set
command gfbt / group_files_by_tape: writes more variables into config.sh when -wf1 / --retrieval-workflow-1 is set
command retrieve: new argument --slurm which is a shortcut for --run-as-slurm-job-with-account
new commands start_watchers and stop_watchers to start or stop recall / retrieve watchers from the current directory
minor bug fixes

1.14.1 (2025-04-11)#

corrections and extensions in the change command (namespaces still not captured correctly)
new argument --print-sizes to command resource_tapes
retrieve will skip files which have a mismatch between total size and sum of the sizes of the file parts (only affects split files)
recall will skip files which are marked as partial files

1.14.0 (2025-03-20)#

new admin-only commands resource_tapes and change
- resource_tapes: provide list of resources; command prints tape barcodes per provided resource
- change: change owner, group and/or mode of provided list of files
changed errors to warnings in recall_needed

1.13.3 (2025-02-20)#

fixed bug in slk_helpers gfbt --gen-search-query which printed debug output

1.13.2 (2025-01-31)#

new argument for slk_helpers retrieve: --run-as-slurm-job-with-account <ACCOUNT> which will generate a SLURM job script for the retrieval
removed argument -d` from ``gfbt / group_files_by_tape because a user might expect it to be the short version of --destination although it is the long version of --details
fixed issue related to verify and recall jobs: in some situations, wrong job names were generated or compared
fixed a bug which caused slk_helpers retrieve to be unable to retrieve a regular 0-byte file from the cache
fixed a bug were a regular 0-byte file was not recognized as being available for retrieval

1.13.1 (2024-12-10)#

commands recall and retrieve automatically remove duplicates in the input resource list; the order of the resources is not preserved

1.13.0 (2024-12-06)#

important changes:
- gfbt / group_files_by_tape only evaluates regular expressions in the input if --regex or --evaluate-regex-in-input is set
- gfbt / group_files_by_tape got many new features
- new commands slk_helpers recall and retrieve are extended version of the slk recall and slk retrieve commands; they can be used much easier in automated workflows; slk_helpers retrieve starts no automatic recall but only retrieves files from the cache
- new command resource_id does the same as exists but accepts multiple resource paths as input
- list_clone_file accepts multiple resource paths as input
- result_verify_job goes through all files checked by the verify job and does additional file check which the verify job does not
general bug fixes:
- tapes which are not available anymore are ignored by most commands
- removed debugging comments from previous versions that were forgotten
new commands:
- recall
- recall_needed: checks whether a recall needs to be performed or not (does not check whether a recall is possible or not)
- resource_id: like exists but accepts multiple resource paths as input and has different output format
- retrieve
command checksum: fixed exit code
command job_report and result_verify_job
- ignored non-existing files, in the past; now, they print them
- StrongLink might shorten the path of files like /arch/blub/test.nc~/test.nc to /arch/blub~/test.nc; for each non-existing file in the output of these commands we check this case, now
- result_verify_job identifies additional problematic files which are not recognized by verify jobs
command group_files_by_tape / gfbt, new arguments:
- --resource-ids: expect resource ids as input
- -dst <dst> / --destinationPath <dst>: ignore files which exist already in dst
- -ns: preserve original namespace in destinationPath
- -wf1 <dst> - --retrieval-workflow-1 <dst>: “workflow 1” => shortcut for --details --count-tapes -ns --write-resource-id --destinationPath <dst>
- -wrid / --write-resource-id: write resource ids per tape to text files with names files_tape_<tape barcode>.txt; further created files are files_all.txt (all resource ids), tapes.txt (all tapes for which the first type of file are created) and config.sh (parameters for watcher scripts); possibly, the files files_multipleTape.txt, files_notStored, files_ignored and/or files_cached are created
- -ao / --append-output: when -wrid is set and target files already exist, append output to them (error otherwise)
- -oo / --overwrite-output: when -wrid is set and target files already exist, append output to them (error otherwise)
- regular expressions in the input are only evaluated when --regex / --evaluate-regex-in-input is set
command list_clone_search: can print resource ids instead of resource paths (--print-resource-ids)
command list_clone_file
- can print a fifth timestamp when --print-more-timestamps is set
- accepts multiple resource paths as input
- can print resource ids instead of resource paths (--print-resource-ids) in the right most column
- can read resource paths from a file --read-from-file <file>
- can read resource paths from stdin (on empty input)
command recall:
- starts recall job for provided resources and instantly returns StrongLink recall job id
- accepts a list of resource paths or resource ids or one search id
- resources or search id can be piped into the command (e.g.: cat file_list.txt | slk_helpers recall ...)
- if -d/--destionation <dst> is set, only files not present in dst are recalled (files compared based on size and mtime)
command resource_id:
- works like exists but
- accepts multiple resource paths as input (provided in the command call, via stdin or via a file --read-from-file <file>)
- prints <resource path>: <resource id> or <resource path>: not exists or <resource path>: problem accessing resource (+ throws error)
command resource_tape got parameter --print-tape-barcode-only
command retrieve:
- starts retrieval of provided resources to the dst provided by -d/--destionation <dst>
- accepts a list of resource paths or resource ids or one search id
- resources or search id can be piped into the command (e.g.: cat file_list.txt | slk_helpers retrieve ...
- if -vv is set, detailed output per file is printed
- files not stored in the cache are not retrieved and no recall is started for them
- a file listing all resources which could not be retrieved can be returned via write-missing-to-file <output_file>
command size exits with an error if a file has an internal size mismatch which is not visible to the user (affected 15 files of 2 x 10^7 files; can only occur when the same file is archived multiple times in parallel to the same location)

1.12.10 (2024-04-12)#

total_number_search_results / tnsr` returns exit code of 1 when search is not finished or when search failed
tape_id returned the tape library name instead of the tape id (fixed; bug present since version 1.12.7)

1.12.9 (2024-04-09)#

fixed a bug which caused tape_status and group_files_by_tape / gfbt to exit with an error when a tape was blocked by a system-job

1.12.8 (2024-04-02)#

added a column containing the tape id to the output of list_clone_file and list_clone_search (column 9; path is now in column 10)

1.12.7 (2024-03-28)#

new hidden command tape_library (slk_helpers tape_library (<tape_id>|--tape-barcode <tape_barcode>))

1.12.6 (2024-03-22)#

improved information in error messages (related to resources and jobs)
updated estimation of verify jobs to submit in submit_verify_job

1.12.5 (2024-03-17)#

fixed command tnsr

1.12.4 (2024-03-17)#

removed false warnings from size command
new short version tnsr for the command total_number_search_results

1.12.3 (2024-03-14)#

new hidden command: list_clone_file
hidden command list_clone renamed to list_clone_search
fixed parameter --count in list_clone / list_clone_search
adapted value of source in json output of submit_verify_job

1.12.2 (2024-03-11)#

submit_verify_job: new parameters --results-per-page was hidden

1.12.1 (2024-03-11)#

reordered code for writing user information on executed command to the log
job_report: fixed few errors
print_rcrs: fixed output of resource id in error message
result_verify_job:
- improved error output
- does not throw error anymore but warning when numbers of submitted and checked files do not agree
- throws no error anymore when target namespace does not exist
search_status also returns exit code 1:
- exit code 0: search ended successfully
- exit code 1: running or failed (incl. search timeout)
- exit code 2: any error returned by StrongLink which is not captured by 1 or 3
- exit code 3: timeout while communicating with StrongLink
size: fixed warnings
submit_verify_job
- has new parameters --end-on-page and --results-per-page
- updated restart information when command timeouts
- modifications and improvements in internal structure

1.12.0 (2024-02-22)#

new parameter --json for commands print_rcrs, submit_verify_job and result_verify_job
fixed output of resource_tape when target file was not stored on a tape yet
list_clone (now list_clone_search): width of uid and gid columns increased
changes in submit_verify_job:
- parameter --resume-on-page is not hidden anymore
- fixed output of parameter --resume-on-page
- improved error messages
changes in result_verify_job:
- improved error messages
- remove resources which do not exist anymore from report
- captures a StrongLink bug when a file had the same name as its parent namespace
- new parameter --json
minor fix in generation of search query via gen_search_query
new hidden command search_status
basic logging into ~/.slk/slk-cli.log
commands can be disabled in /etc/stronglink.conf by adding a list with disabled commands via key disabled_commands: "disabled_commands":["cmd1", "cmd2", ...]

1.11.2 (2024-01-15)#

new hidden command list_clone (later renamed to list_clone_search):
- prints search results
- similar list_search
- prints four time stamps (in this order):
  
  mtime of the file (time stamp of “last modification of the file prior to the archival”)
  
  time stamp of first file version in StrongLink
  
  stamp of current file version in StrongLink
  
  time stamp of last StrongLink-internal copy process of this file (e.g. last recall)
various small code updates: better error handling; capture problematic situations; fixed typos
improved commands which are based on searches (group_files_by_tape, total_number_search_results, submit_verify_job, …):
- search timeouts are properly captured
- unknown/unexpected search stati are recognized
- better error handling when search does not finish or is not successful
print_rcrs updates:
- prints tape id and barcode
- got the parameter --json to print output as JSON

1.11.1 (2023-12-20)#

new command resource_tape which prints on which tape(s) a resource is stored on; can be used if gfbt / group_files_by_tape does not work due to high system load of StrongLink
fixed a bug in gfbt / group_files_by_tape which caused that a wrong tape id / barcode was used when a file was migrated from an old HPSS tape to a new tape

1.11.0 (2023-12-08)#

If files are in an unclear caching state, commands like is iscached and is_on_tape will not exit with an error but throw a warning.
If files are in an unclear caching state, iscached will inform the user that files in unclear caching state exist and exit with an error. If -v or -vv is set, the files in unclear caching state will be listed.

Note

unclear caching state should only occur while a file is copied from tape to HSM-cache or when there is a connection issue between StrongLink and HSM-Cache. Please contact support@dkrz.de when this happens.

1.10.2 (2023-11-29)#

updated verbose messages for size and result_verify_job commands
updated help text of size
removed debugging output from submit_verify_job

1.10.1 (2023-11-20)#

updated verbose messages for size command

1.10.0 (2023-11-17)#

better handling of connection timeouts with StrongLink
updated submit_verify_job:
- updated output when the command does not submit any verify job
- added a parameter --resume-on-page <n> option to simplify resuming the command in times of many connection losses to StrongLink
- added a parameter --save-mode to start verify jobs for only 1000 files or less in order to simplify restarting the command in times of many connection losses
new command result_verify_job:
- list relevant errors of verify job (default; no special arguments)
- list checked files (--soures)
- get part of the header of the verify report (--header)
- list number of errors and checked files (--number-errors and --number-sources)
extended command size by new parameters:
- -R / --recursive for requesting the size of the content of folders recursively
- --pad-spaces-left for space padding to the left in order to align file/namespaces sizes when the command is called multiple times

1.9.10 (2023-10-23)#

add --quiet / -q to command total_number_search_results (hidden)

1.9.9 (2023-10-17)#

access constraints for requesting job information; non-admin users may only access:
- VERIFY jobs of the current user or
- COPY jobs, which do retrievals/recalls and were started by slk
commands submit_verify_job_files and submit_verify_job_namespace from previous release only allowed for admin users
new commands:
- submit_verify_job: run a verify job for a provided set of files
- is_admin_session: Check if the use is currently logged in as normal user or admin user
- search_incomplete: Prints whether the search is incomplete (still running)
- search_successful: Prints whether the search was successful
- search_immediately: Creates search and returns search id immediately, even if search is not finished (hidden; only for specific user cases)

1.9.8 (2023-10-05)#

new commands (hidden because not final versions):
- job_report: print a job report; e.g. of a verify job
- submit_verify_job_files: submit a verify-job for a list of files (as paths) or of resource ids
- submit_verify_job_namespace: submit a verify-job for a namespace (as path) or a resource/namespace id
- print_rcrs: (print size and checksums of file parts; some HPSS files are stored as two parts on two tapes
catch all HTTP status code >= 400 everytime an HTTP request is send
new job states: BLOCKED, PAUSED, STOPPED, WAITING, OTHER

1.9.7 (2023-08-16)#

mkdir has a new argument -p / --parent which is similar to -R but throws no error when target exists and is a namespace/folder; thus, it behaves like the Linux mkdir -p in the terminal
changed error message which mkdir prints when it receives a path to a file as target

1.9.6 (2023-08-02)#

gfbt / group_files_by_tape has new parameter --print-resource-id which will print resources IDs instead of file paths

1.9.5 (2023-07-14)#

bug fixes regarding some old HPSS files used in gfbt

1.9.4 (2023-07-13)#

bug fixes in command gen_search_query:
- modified description of command
- added new field tape_barcode
- field smart_pool internally was compared against tape_barcode
- value of field smart_pool is now checked against list of existing Smart Pools

1.9.3 (2023-07-05)#

fixed conversion of dates to seconds since 1970 instead of milliseconds since 1970; relevant for gen_search_query
gen_search_query now also understands the operators <, >, <= and >=
removed unnecessarily created instances of ObjectMapper

1.9.2 (2023-06-30)#

new command gen_search_query: * generates a JSON search query which can be run by slk search * accepts search conditions/fields like netcdf.Project='abc', resources.birth_time='2023-01-01T13:00:00' or path=/arch/bm0146/k204221 * search conditions/fields are linked via and * user can provide another existing search query via --search--query which is linked via and to the other input* new command gen_search_query

1.9.1 (2023-06-16)#

search_limited: did not recognize certain failed searches in the past
iscached: * no error thrown anymore when resource ids are inserted which represent namespaces * correct output is printed when only one file was provided and -v or -vv is set * fixed issue when checking caching status of 0 byte files
job_status: * new job stati FAILED and SUCCESSFUL replace old status COMPLETED; COMPLETED might still be used * returns exit code 1 if a job has status FAILED, ABORTED or ABORTING
new command is_on_tape: * same as iscached but checks if files are on tape * files, which are on tape and in the cache, are considered as being on tape * this is NOT the inverse of iscached => a file can be on tape and in the cache
optional verbose and summary output is now printed to stderr instead of stdout * hsm2json: verbose output printed to stderr (but summary not) * gfbt / group_files_by_tape: print verbose output (diagnostic purpose) to stderr * search_limited: search status
these commands can handle searches of which one or more resources were deleted * iscached * is_on_tape * has_no_flag_partial * hsm2json * gfbt / group_files_by_tape
list_search may list already deleted files (not checked for performance reasons)

1.9.0 (2023-05-16)#

changed exit codes to 3 when a timeout error is thrown or a connection cannot be established
command has_flag_partial has been renamed to has_no_flag_partial
has_no_flag_partial behaves the same like iscached
iscached * prints list of not-cached files when -v is set (now: negativ-list + summary; past: summary) * prints a summary in the end when -vv is set (now: full file list + summary; past: full file list)
job_queue * new argument --format which has the same meaning as -i / --interpret * new output format JSON / J

1.8.10 (2023-05-08)#

new command has_flag_partial to check whether a file is flagged as partial (incomplete) file

Warning

has_flag_partial was renamed to has_no_flag_partial in slk_helpers 1.9.0.

1.8.9 (2023-05-02)#

fixed an error in tape_status which was thrown when no barcode was provided
command job_queue` has new optional argument ``-i <INTERPRET_TYPE> / --interpret <INTERPRET_TYPE> with these values for INTERPRET_TYPE (case insensitive): * RAW/R: same as argument not set * TEXT/T: print short textual interpretation of the queue status => none, short, medium, long, jammed * DETAILS/D: print detailed textual interpretation of the queue status * NUMERIC/N: print a number representing the queue status => 0 (==none), 1 (==short), …, 4 (==jammed)

1.8.8 (2023-04-25)#

new command searchid_exists
iscached now accepts a directory/namespace as input (with -R set)

1.8.7 (2023-04-13)#

changed exit code of checksum when a file is stored on more than one tape from 1 to 2
editorial changes in the changelog

1.8.6 (2023-04-06)#

minor bugfixes in the error output messages
properly exit when wrong parameters are provided (in some situations)

1.8.5 (2023-04-06)#

new flag --help to print the help for a specific command; e.g. slk_helpers --help mkdir will print the help for mkdir
new hidden flag --pid will print the Linux process id of the Java virtual machine
group_files_by_tape has new flags: * --set-max-tape-number-per-search <N> / --smtnps <N> which causes the searches to be run not for one tape but for a maximum of N tapes – only if less than 50 files are to be retrieved per search * -v (same as --print-progress) and -vv for verbose and double-verbose output, respectively
no command help is printed when a command is used the wrong way
fixed: commands which expect a list of Strings did not recognize wrong parameters but interpreted them as items of the list

1.8.4 (2023-04-05)#

iscached and size also accept resource ids (via flag --resource-id) in addition to resource paths
iscached: * also accepts search ids (via flag --search-id) in addition to resource paths and resource ids * got the flags -v and -vv for verbose and double verbose mode, respectively
resource_type and resource_permissions expect a resource path by default xor a resource id via --resource-id
interal changes related to the new class Resource
group_files_by_tape * fixed when a file has no storage information * new parameter --search-query '<search_query>' * internal searches are performed differently which partly more efficient
gen_file_query: * new parameters --cached-only, --not-cached (currently not working) and --tape-barcodes TAPE1,TAPE2,... * bugs in the JSON output were fixed * properly deal with files without storage information * width of status in normal text output (value in brackets) increased by one

1.8.3 (2023-03-23)#

iscached properly prints the cache
changes in the code base: new classes Resource and Checksums
group_files_by_tape / gfbt has flags --json and --json-pretty
checksum did not work after update to 1.8.1

1.8.2 (2023-03-21)#

job_status`: fixed a certain job status which caused job_status to fail

1.8.1 (2023-03-20)#

a file might be split into multiple parts, which are stored on separate tapes; this was not captured properly by the following commands and is fixed now: * checksum: prints an error when a file is split because no checksum are available for the overall file (only for the file parts) * gfbt: properly identifies files stored on multiple tapes

1.8.0 (2023-03-14)#

new commands: * resource_permissions: print permissions of a resource * resource_type: print type of a resource (‘namespace’ or ‘file’) * resource_path: same as resourcepath * tape_barcode: get tape barcode from tape id (barcode needed for search queries) * tape_id: get tape id from barcode
new arguments * --tape-barcode is new for tape_exists and tape_status * --print-tape-barcode, -c/--count-files and --print-progress are new for group_files_by_tape / gfbt
search_limited has be deprecated; please use slk search
tests for tape_id, tape_barcode, tape_exists, resource_type, resource_permissions

1.7.6 (2023-03-07)#

minor bugfixes in the output of the command metadata

1.7.5 (2023-03-01)#

extended the command json2hsm by the argument -j/--json-string JSON_STRING which allows to pass a JSON string directly to the command instead of writing it into a file. If a filename is provided in addition, an error is thrown.
the commands hsm2json and metadata have a new argument --print-hidden; they do not print the field netcdf.Data by default (and other sidecar data); these data are printed when the new argument --print-hidden is set

1.7.4 (2023-02-10)#

removed , in the output of group_files_by_tape
improved conversion of dates in hsm2json and json2hsm
hsm2json exports dates according to ISO 8601
change JSON metadata standard from 2.1.0 to 2.1.1
- added JSON metadata key mime_type

1.7.3 (2023-02-06)#

added one missing internally used job status (PAUSING)
change JSON metadata standard from 2.0.0 to 2.1.0
- added JSON metadata key protocol
- improved usage of JSON metadata key location
changed code structure
restructed code file for metadata

1.7.2 (2023-02-01)#

added one missing internally used job status (ABORTING)

1.7.1#

fixed new tape status ERRORSTATE

1.7.0#

new commands: job_exists, job_status, job_queue
group_files_by_tape (and tape_status):
- modified structure of the output
- new tape status ERRORSTATE when the tape is in a bad state which needs intervention from the support
increased timeout for the time to establish a connection
minor restructuring of the code

1.6.0#

new commands: tape_exists, tape_status, group_files_by_tape (+ short form gfbt)
minor restructuring of the code

1.5.8#

fixed errors related to processing of JSON returned by StrongLink
restructured code

1.5.7#

changes from 1.4.0 to 1.5.7

json2hsm / import_metadata: * renamed command import_metadata to json2hsm * removed parameter --update-only-one-resource * --write-mode got new option CLEAN which cleans all metadata of the selected resource before setting the new metadata (clean == removes content). * print JSON formatted summaries when --print-json-summary is set
hsm2json / export_metadata: * renamed command export_metadata to hsm2json * hsm2json prints an export summary when new parameter --print-summary is set * hsm2json print JSON formatted summaries when --print-json-summary is set * do not print metadata as pretty but as compact JSON when --write-compact-json is set
slk_helpers list_search: print search results continuously (in contrast to collecting all search results, first, before printing them altogether as slk list does it)
updated tests

1.4.0#

removed import_metadata_recursive
merge the three other import_metadata_* commands to import_metadata: * --update-only-one-resource PATH_OD_ID => like import_metadata_one_file * --use-res-id => like import_metadata_use_res_id * none of the previous flags => like import_metadata_use_abs_path
JSON structure of metadata was incremented from v1.0.0 to v2.0.0; v2.0.0 is equal to the output of slk tag -display RESOURCE
remove -Q/--fully-quiet flag (fully quiet; suppress error messages)
readme updated

1.3.x#

new commands:
- export_metadata
- import_metadata_one_file
- import_metadata_recursive
- import_metadata_use_abs_path (hidden; meant for expert users)
- import_metadata_use_res_id (hidden; meant for expert users)
new flags / arguments
- slk_helpers metadata now has --alternative-output-format
slk_helpers gen_file_query a file list in a string which is separated by newlines
minor bug fixes

1.2.x#

new commands
- gen_file_query: create a query string to search files, which are provided as input
- list_search: list search results (incl. path of resources)
- updated exit codes

1.1.x#

new commands
- iscached: prints out whether a file is cached (== quick access) or not
- search_limited: like slk search put works only for searches that 1000 results or less)
- version: prints the version of slk_helpers