slk helpers: slk extension provided by DKRZ#
file version: 16 Jan 2023
current software versions: slk_helpers version 1.7.1
The slk_helpers
is an extensions to the slk
. The slk
is developed by StrongLink and belongs to the StrongLink HSM software. The slk_helpers
have been developed at the DKRZ to provide some useful functionality that is not included in the slk
. If specific usage information is missing on this help page or if you encounter errors, please contact support@dkrz.de.
Note
StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.
slk_helpers help#
$ slk_helpers help
lists all commands
slk version#
$ slk version
print the current slk_helpers
version
slk_helpers checksum#
$ slk_helpers checksum [-t CHECKSUM_TYPE] RESOURCE_PATH
-t
,--type
: checksum_type (possible values: sha512, adler32); omit to print all available checksums
Prints the checksum(s) of a resource. If -t
is set, the checksum of type CHECKSUM_TYPE
is retrieved. Possible values are sha512
and adler32
. If -t
is not set, all available checksums are printed. It only works for files and not for namespaces. Namespaces do not have checksums.
StrongLink calculates two checksums of each archived file and stores them in the metadata. It compares the stored checksums with the file’s actual checksums at certain stages of the archival and retrieval process. Commonly, users do not need to check the checksum manually. But, you can if you prefer to do it. If a file has no checksum then it has not been fully archived yet (e.g. the copying is still in progress; archival process canceled).
slk_helpers exists#
$ slk_helpers exists RESOURCE_PATH
Check if the resource RESOURCE_PATH
exists. The resource id is returned if it exists. exists
works for files and namespaces.
slk_helpers gen_file_query#
$ slk_helpers gen_file_query [-R] RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]
-R
,--recursive
: generate a query which does a recursive search
Generates a search query which can be used with slk search
and slk_helpers search_limited
to perform a search for the resources RESOURCE1
, RESOURCE2
, … . These can be either files or namespaces. If a filename without path is provided, then the file will be searched for everywhere in the HSM. Filenames may contain regular expressions but no bash wildcards/globs. The path to a file must not contain regular expressions. Detailed examples and explanations are given in Generate search queries for filenames.
slk_helpers gfbt#
please see group_files_by_tape
slk_helpers group_files_by_tape#
$ slk_helpers group_files_by_tape (RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]|--search-id SEARCH_ID) [-R] [--gen-search-query|--run-search-query] [--print-tape-id] [--print-tape-status]
RESOURCE1 [RESOURCE2 [RESOURCE3 [...]]]
(<list of GNS paths>
) or --search-id SEARCH_ID
are mandatory as input. A combination of both is not allowed.
--gen-search-query
: generate and print (a) search query strings instead of the lists of files per tape, Default: false-R
,--recursive
: Search namespaces recursively for input files, Default: false--run-search-query
: generate and run (a) search query strings instead of the lists of files per tape and print the search i, Default: false--search-id SEARCH_ID
: Use an existing search as input instead of a<list of GNS paths>
(error is thrown if<list of GNS paths>
is provided and--search_id
is set)--print-tape-id
: print the tape id on the far left followed by a:
, Default: false--print-tape-status
: print the status (avail
orblocked
) of the tape of each file group; if-i
/--print-tape-id
is set, this is printed:TAPE_ID, TAPE_STATUS: FILES
; if-i
/--tape
is not set, this is printed:TAPE_STATUS: FILES
, Default: false
Receives a list of files or a search id as input. Looks up which files are stored in the HSM cache and which are not stored in the HSM cache but only on tape. Files on tape are grouped by tape: each line of the output contains all files which are on one tape. The user can directly created a search query for retrieving all files from one tape (--gen-search-query
) or directly run this search (--run-search-query
). In the latter case, the search id is printed per tape. If the user wants to know the tape id and the tape status, she/he might use --print-tape-id
and --print-tape-status
, respectively.
slk_helpers hsm2json#
hsm2json [options] <GNS path>
--instant-metadata-record-output
: not set: read the metadata records of all specified files and print them when the last record is read; if set: print a metadata record directly after it had been read. Needs -l/–write-json-lines to be set. Default: false-o FILE
,--outfile FILE
: Write the output into a file instead to the stdout-q
,--quiet
: print nothing to stdout (e.g. no summary), Default: false-R
,--recursive
: export metadata from the HSM recursively (all files in sub-directories of the provided source path will be considered), Default: false-r FILE
,--restart-file FILE
: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)-s SCHEMA[,SCHEMA[...]]
,--schema SCHEMA[,SCHEMA[...]]
: import only metadata fields of listed schemata (comma-separated list without spaces)-v
,--verbose
: activate verbose mode, Default: false-l
,--write-json-lines
: write JSON-lines instead of normal JSON, Default: false-m MODE
,--write-mode MODE
: select write mode when-o
/--outfile
is set, Default: ERROR, Possible Values: [OVERWRITE, ERROR]--print-summary
: print summary on how many metadata records have been processed--write-compact-json
: do not print metadata as pretty but as compact JSON; default is pretty JSON
Extracts metadata from HSM file(s) and returns them in JSON structure. See JSON structure for/of metadata import/export for details.
slk_helpers hostname#
$ slk_helpers hostname
Prints the hostname to which slk is currently connected to or to which slk will connect. It should be archive.dkrz.de
. This is the default value on each Levante node. You can overwrite the default hostname by exporting the environment variable SLK_HOSTNAME
(e.g. by export SLK_HOSTNAME=stronglink.hsm.dkrz.de
on bash
).
slk_helpers iscached#
$ slk_helpers iscached RESOURCE_PATH
Checks if the resource RESOURCE_PATH
is stored in the HSM cache. The user is informed via a text message whether RESOURCE_PATH
exists. Additionally, the exit code will be 0
if the resource is in the cache and 1
if not (exit code: get the variable $?
directly after the slk call). If a file is not stored in the cache then it is only stored on tape. Retrievals from tape will take considerable longer than retrievals from cache.
slk_helpers json2hsm#
json2hsm [options] <SL-JSON metadata file> <GNS path>
-l
,--expect-json-lines
: consider the input file to be JSON-lines instead of normal JSON, Default: false--ignore-non-existing-metadata-fields
: if set: if a metadata field of the source metadata record does not exist in StrongLink then this metadata field is skipped. if not set: throw an error and exit as soon a source metadata field does not exist in StrongLink. If this flag is not set but-k
/--skip-bad-metadata-sets
is set, then metadata records with non-existing metadata fields will be skipped. Default: false--instant-metadata-record-update
: not set: read the whole JSON file and collect all metadata updates => apply all updates in the end; if two metadata records exist for one resource, this will become apparent before any metadata are written; if set: write each metadata record to StrongLink directly after it has be read from the JSON file; if two metadata records exist for one resource, the first metadata record will be written to StrongLink and the duplication will remain undetected until the duplicate record is read from JSON. Default: false-q
,--quiet
: print nothing to stdout (e.g. no summary), Default: false-r FILE
,--restart-file FILE
: set a restart file in which the processed metadata entries are listed (if restart file exists, listed files will be skipped)-s SCHEMA[,SCHEMA[...]]
,--schema SCHEMA[,SCHEMA[...]]
: import only metadata fields of listed schemata (comma-separated list without spaces)-k
,--skip-bad-metadata-sets
: skip damaged / incomplete metadata sets [default: throw error], Default: false-v
,--verbose
: activate verbose mode, Default: false-m MODE
,--write-mode MODE
: select write mode for metadata, Default:OVERWRITE
, Possible Values:OVERWRITE
,KEEP
,ERROR
,CLEAN
(CLEAN
: first, delete all metadata from the target schema and, then, write new metadata)
Reads metadata from JSON will and write them to archived files into HSM. Uses relative paths from metadata records plus base path provided by the user to identify target files. See JSON structure for/of metadata import/export for details.
slk_helpers job_exists#
slk_helpers job_exists JOB_ID
Check if a tape read job with the given ID exists.
slk_helpers job_queue#
slk_helpers job_queue
Prints status of the queue of tape read jobs. The output looks like this:
$ slkh job_queue
total read jobs: 110
active read jobs: 12
queued read jobs: 98
slk_helpers job_status#
slk_helpers job_status JOB_ID
Check the status of a tape read job with the given ID. The status is one of these: ABORTED
, QUEUED
, PROCESSING
and COMPLETED
. When the status is QUEUED
then the place in the queue is appended in brackets, e.g.: QUEUED (12)
.
slk_helpers list_search#
slk_helpers list_search [-f] [-d] SEARCH_ID
-f
,--only-files
: list only search results which are files (same asslk list
)-d
,--only-directories
,--only-namespaces
: print only search results which are namespaces (cannot be printed byslk list
)--count COUNT
: print not more thanCOUNT
results (hidden; see note below)--start START
: skip the firstSTART - 1
results (hidden; see note below)
Lists all search results of the search SEARCH_ID
. Do print the full path of all search results. If -f
and -d
are provided at once, the output is the same as when both arguments were not set.
Note
slk_helpers list_search
collects all search results independent of whether the user has read permissions or not; --start
and --count
refer to these search results and not to the search results the user is allowed to see.
Warning
slk_helpers list_search SEARCH_ID
collects all search results, first, and, then, prints them. This might take a while if many search results are found. However, we print a warning if this is the case.
slk_helpers metadata#
$ slk_helpers metadata RESOURCE_PATH
--alternative-output-format
: different format to print metadata (each row is:schema.field: value
), Default: false
Prints the available metadata of a resource. Corresponds to slk tag
– whereas slk tag
sets metadata and slk_helpers metadata
prints metadata.
slk_helpers mkdir#
$ slk_helpers mkdir [-R] GNS_PATH
-R
: use the -R create folders recursively, if the parent folders do not exist; Default: false
Creates a namespace in an already existing namespace (== create basename GNS_PATH
in dirname GNS_PATH
). This command works like mkdir
on a Linux terminal. Create nested namespaces recursively when -R
is set (like mkdir -r
on Linux terminal).
slk_helpers resourcepath#
$ slk_helpers resourcepath RESOURCE_ID
Gets path for a resource id
slk_helpers search_limited#
$ slk search "RQL Search Query"
$ slk search 'RQL Search Query'
This command will conduct a background search for files that match the specific query specified using a query language syntax that was designed by StrongLink. The query language is described on the StrongLink query language page and in the StrongLink Command Line Interface Guide from page 6 onwards.
Note
Operators in queries start with a $
. If a query is delimited by "
then the $
has to be escaped by a leading \
(\$OPERATOR
). Otherwise, the operator is interpreted as environment variable by the shell. Alternatively, use '
as delimiter.
A search result ID (search_id
) will be returned if 1000 or less results were found. If more results were found, an error and no search ID will be printed. 1000 refers to the total number of results of which some might not be visible to the user. The search ID can be used to list and retrieve files from the archive (see below).
One might need the user or group ids of respective users/groups to search files belonging to them. These ids are obtained as follows.
Get user id:
# get your user id
$ id -u
# get the id of any user
$ id USER_NAME -u
# get the id of any user
$ getent passwd USER_NAME
# OR
$ getent passwd USER_NAME | awk -F: '{ print $3 }'
# get user name from user id
$ getent passwd USER_ID | awk -F: '{ print $1 }'
Get group id:
# get group ID from group name
$ getent group GROUP_NAME
# OR
$ getent group GROUP_NAME | awk -F: '{ print $3 }'
# get group name from group id
$ getent group GROUP_ID | awk -F: '{ print $1 }'
# get groups and their ids of all groups of which member you are
$ id
Please see our documentation of specific Usage Examples (slk Usage Examples) and the StrongLink Command Line Interface Guide for exemplary calls of slk search
. These can be 1:1
used with slk_helpers search_limited
.
Note
slk_helpers search_limited
counts all files and namespaces that match the search query and that the current user is allowed to see/read. slk list
lists only the respective files. Therefore, slk_helpers search_limited
might print an error that more than 1000 resources were found although there are less than 1000 matches of which the user has read permissions. Moreover, different users might get different output of slk list
for the same search id.
slk_helpers session#
$ slk_helpers session
Prints until when the current slk session is valid.
slk_helpers size#
$ slk_helpers size
Returns file size in byte
slk_helpers tape_exists#
$ slk_helpers tape_exists TAPE_ID
Returns whether the tape with tape id TAPE_ID exists in the tape library or not.
slk_helpers tape_status#
$ slk_helpers tape_status [--details] TAPE_ID
--details
: print a more detailled description of the retrieval status (different states ofavail
are possible)
Prints the status of a tape for retrievals: avail
or blocked
.
Exit codes#
command |
task |
exit code |
---|---|---|
bad input command |
always (redirected to slk help) |
2 |
general (not help, version and session) |
session expired |
2 |
issue related to config file |
2 |
|
help |
always |
0 |
checksum |
resource exists and has checksum |
0 |
resource not found |
1 |
|
requested checksum not available |
1 |
|
any other error |
2 |
|
exists |
resource exists |
0 |
resource does not exist |
1 |
|
any error |
2 |
|
export_metadata |
any error |
2 |
gen_file_query |
query successfully generated |
0 |
any error |
2 |
|
gfbt (same as group_files_by_tape) |
files successfully grouped |
0 |
any error |
2 |
|
group_files_by_tape |
files successfully grouped |
0 |
any error |
2 |
|
hostname |
hostname is set and is as printed |
0 |
any error |
2 |
|
hsm2json |
metadata exported successfully |
0 |
any error |
2 |
|
iscached |
resource exists and is cached |
0 |
resource exists and is not cached |
1 |
|
resource does not exist |
2 |
|
any error |
2 |
|
json2hsm |
metadata imported successfully |
0 |
any error |
2 |
|
job_exists |
job exists |
0 |
job does not exist |
1 |
|
any error |
2 |
|
job_queue |
number of jobs printed successfully |
0 |
any error |
2 |
|
job_status |
status of the job printed successfully |
0 |
any error |
2 |
|
list_search |
search id correct and search results to print |
0 |
search id correct but no results to print |
1 |
|
any error |
2 |
|
metadata |
resource exists and metadata available |
0 |
resource does not exist |
1 |
|
any error |
2 |
|
mkdir |
namespace successfully created |
0 |
namespace with same name already exists |
1 |
|
any other error |
2 |
|
resourcepath |
resource with given ID exists |
0 |
resource with given ID does not exist |
1 |
|
any error |
2 |
|
search_limited |
search successfully performed |
0 |
any error |
2 |
|
session |
login token exists and is not expired |
0 |
no login token |
1 |
|
session expired |
1 |
|
size |
resource exists and has size |
0 |
resource does not exist |
1 |
|
any error |
2 |
|
tape_exists |
tape exists |
0 |
tape does not exist |
1 |
|
any error |
2 |
|
tape_status |
tape is available for reading |
0 |
tape is blocked / currently no reading |
1 |
|
any error |
2 |
|
version |
always |
0 |
Technical background of selected commands#
slk_helpers gen_file_query#
The search query is generated as follows: The input file list is taken and each path is separated into filename (like basename PATH
) and directory (like dirname PATH
). All filenames in the same directory are grouped and a regular expression is generated which finds exactly these files. Then this expression is linked via an and
to the respective directory in which these files are located. This is done for each distinct directory in the input. These search expressions are linked via an or
at the top level.
It is checked whether a directory exists in StrongLink. An error is thrown if it doesn’t exist.
The resulting search query can be optimized in length by the user. We do not do this in the slk_helpers
because it would add considerable complexity to the code.
Major Changes#
1.7.1#
fixed new tape status
ERRORSTATE
1.7.0#
new commands: job_exists, job_status, job_queue
group_files_by_tape
(andtape_status
):modified structure of the output
new tape status
ERRORSTATE
when the tape is in a bad state which needs intervention from the support
increased timeout for the time to establish a connection
minor restructuring of the code
1.6.0#
new commands: tape_exists, tape_status, group_files_by_tape (+ short form gfbt)
minor restructuring of the code
1.5.8#
fixed errors related to processing of JSON returned by StrongLink
restructured code
1.5.7#
changes from 1.4.0 to 1.5.7
json2hsm
/import_metadata
: * renamed commandimport_metadata
tojson2hsm
* removed parameter--update-only-one-resource
*--write-mode
got new optionCLEAN
which cleans all metadata of the selected resource before setting the new metadata (clean == removes content). * print JSON formatted summaries when--print-json-summary
is sethsm2json
/export_metadata
: * renamed commandexport_metadata
tohsm2json
*hsm2json
prints an export summary when new parameter--print-summary
is set *hsm2json
print JSON formatted summaries when--print-json-summary
is set * do not print metadata as pretty but as compact JSON when--write-compact-json
is setslk_helpers list_search
: print search results continuously (in contrast to collecting all search results, first, before printing them altogether asslk list
does it)updated tests
1.4.0#
removed
import_metadata_recursive
merge the three other
import_metadata_*
commands toimport_metadata
: *--update-only-one-resource PATH_OD_ID
=> likeimport_metadata_one_file
*--use-res-id
=> likeimport_metadata_use_res_id
* none of the previous flags => likeimport_metadata_use_abs_path
JSON structure of metadata was incremented from v1.0.0 to v2.0.0; v2.0.0 is equal to the output of
slk tag -display RESOURCE
remove
-Q
/--fully-quiet
flag (fully quiet; suppress error messages)readme updated
1.3.x#
- new commands:
export_metadata
import_metadata_one_file
import_metadata_recursive
import_metadata_use_abs_path
(hidden; meant for expert users)import_metadata_use_res_id
(hidden; meant for expert users)
- new flags / arguments
slk_helpers metadata
now has--alternative-output-format
slk_helpers gen_file_query
a file list in a string which is separated by newlinesminor bug fixes
1.2.x#
- new commands
gen_file_query
: create a query string to search files, which are provided as inputlist_search
: list search results (incl. path of resources)updated exit codes
1.1.x#
- new commands
iscached
: prints out whether a file is cached (== quick access) or notsearch_limited
: like slk search put works only for searches that 1000 results or less)version
: prints the version of slk_helpers