Changelog: slk_helpers v1.13.2#

Update from slk_helpers v1.12.10 to 1.13.2

see here for changes from slk_helpers v1.10.2 to 1.12.10

see here for changes from slk_helpers v1.9.7 to v1.10.2

see here for changes from slk_helpers v1.9.5 to v1.9.9

see here for changes from slk_helpers v1.9.3 to v1.9.5

see here for changes from slk_helpers v1.9.0 to v1.9.3

see here for changes from slk_helpers v1.8.10 to v1.9.0

see here for changes from slk_helpers v1.8.2 to v1.8.10

see here for changes from slk_helpers v1.7.4 to v1.8.2

see here for changes from slk_helpers v1.7.1 to v1.7.4

see here for changes from slk_helpers v1.6.0 to v1.7.1

see here for changes from slk_helpers v1.5.8 to v1.6.0

see here for changes from slk_helpers v1.5.7 to v1.5.7

see here for changes from slk_helpers v1.2.x to v1.5.7

Please have a look here for a detailed description of the new features and here for an incremental changelog.

Breaking change#

gfbt / group_files_by_tape: Regular expressions in the input path are not automatically evaluated anymore. Please set parameter --regex or --evaluate-regex-in-input when regular expressions are included.
Verify Jobs: A few rare types of incomplete files are not captured by verify jobs. Therefore, a postprocessing routine was added to result_verify_job to identify such files. This routine may increase the command’s runtime considerably if several thousand files are check. You may set the parameter --quick in order to skip the additional check – which we do not recommand.

Major changes#

gfbt / group_files_by_tape only evaluates regular expressions in the input if --regex or --evaluate-regex-in-input is set
gfbt / group_files_by_tape got countless new features
The new commands slk_helpers recall and retrieve are extended version of the slk recall and slk retrieve commands. They can be used much easier in automated workflows. slk_helpers retrieve starts no automatic recall but only retrieves files from the cache.
resource_id does the same as exists but accepts multiple resource paths as input
list_clone_file accepts multiple resource paths as input
result_verify_job goes through all files checked by the verify job and does additional file verification checks which the verify job does not

Details on new commands#

resource_id:
- works like exists but …
- accepts multiple resource paths as input (provided in the command call, via stdin or via a file --read-from-file <file>)
- prints <resource path>: <resource id>` or ``<resource path>: not exists or <resource path>: problem accessing resource (+ throws error)
recall (development; might change behaviour; please coordinate with DKRZ support prior to usage):
- starts recall job for provided resources and instantly returns StrongLink recall job id
- accepts a list of resource paths or resource ids or one search id
- resources or search id can be piped into the command (e.g.: cat file_list.txt | slk_helpers recall ...)
- if -d/--destionation <dst>` is set, only files not present in ``dst are recalled (files compared based on size and mtime)
retrieve (development; might change behaviour; please coordinate with DKRZ support prior to usage):
- starts retrieval of provided resources to the dst provided by -d/--destionation <dst>
- accepts a list of resource paths or resource ids or one search id
- resources or search id can be piped into the command (e.g.: cat file_list.txt | slk_helpers retrieve ...)
- if -vv is set, detailed output per file is printed
- files not stored in the cache are not retrieved and no recall is started for them
- a file listing all resources which could not be retrieved can be returned via write-missing-to-file <output_file>
- command --run-as-slurm-job-with-account <ACCOUNT> creates a SLURM job script for retrieval which will re-submit itself automatically until all files are back; if --dry-run is not set, script is directly submitted as SLURM job; does not start a recall but only copies files from the cache to the user;
recall_needed (development; might change behaviour; please coordinate with DKRZ support prior to usage):
- same parameters as recall; like a slk_helpers recall --dry-run
- checks whether a recall needs to be performed or not (does not check whether a recall is possible or not)

All fixes and new features of existing commands#

general bug fixes:
- tapes which are not available anymore are ignored by most commands
- removed debugging comments from previous versions that were forgotten
- fixed a bug were a regular 0-byte file was not recognized as being available for retrieval
command checksum: fixed exit code
command job_report and result_verify_job
- ignored non-existing files, in the past; now, they print them
- StrongLink might shorten the path of files like /arch/blub/test.nc~/test.nc to /arch/blub~/test.nc; for each non-existing file in the output of these commands we check this case, now
- result_verify_job identifies additional problematic files which are not recognized by verify jobs
commands group_files_by_tape / gfbt, new arguments:
- --resource-ids: expect resource ids as input
- -dst <dst>` / ``--destinationPath <dst>`: ignore files which exist already in ``dst
- -ns: preserve original namespace in destinationPath
- -wf1: “workflow 1” => shortcut for --details --count-tapes -ns --write-resource-id --destinationPath <dst>
- -wrid / --write-resource-id:
  
  write resource ids per tape to text files with names files_tape_<tape barcode>.txt,
  
  creates file files_all.txt (all resource ids),
  
  creates file tapes.txt (all tapes for which the first type of file are created),
  
  creates file config.sh (parameters for watcher scripts);
  
  possibly, creates files files_multipleTape.txt,
  
  possibly, creates files files_notStored
  
  possibly, creates files files_ignored
  
  possibly, creates files files_cached
- -ao / --append-output: when -wrid is set and target files already exist, append output to them (error otherwise)
- -oo / --overwrite-output: when -wrid is set and target files already exist, append output to them (error otherwise)
- regular expressions in the input are only evaluated when --regex / --evaluate-regex-in-input is set
- removed argument -d from gfbt / group_files_by_tape because a user might expect it to be the short version of --destination although it is the long version of --details
commands list_clone_search and list_clone_file can print resource id instead of resource path (--print-resource-id)
command list_clone_file * can print a fifth timestamp when --print-more-timestamps is set * accepts multiple resource paths as input * can print resource ids instead of resource paths (--print-resource-ids) in the right most column * can read resource paths from a file --read-from-file <file> * can read resource paths from stdin (on empty input)
command resource_tape got parameter --print-tape-barcode-only
command size exits with an error if a file has an internal size mismatch which is not visible to the user (affected 15 files of 2 x 10^7 files; can only occur when the same file is archived multiple times in parallel to the same location)

Known issues#

list_search and list_clone_search may list already deleted files (not checked for performance reasons)

How to get more memory for my Slurm job HSM tools February 2025 updates

03 February 2025

Categories

Tags

Changelog: slk_helpers v1.13.2#

Breaking change#

Major changes#

Details on new commands#

All fixes and new features of existing commands#

Known issues#