Changelog: slk_helpers v1.13.2#

Update from slk_helpers v1.12.10 to 1.13.2

see here for changes from slk_helpers v1.10.2 to 1.12.10

see here for changes from slk_helpers v1.9.7 to v1.10.2

see here for changes from slk_helpers v1.9.5 to v1.9.9

see here for changes from slk_helpers v1.9.3 to v1.9.5

see here for changes from slk_helpers v1.9.0 to v1.9.3

see here for changes from slk_helpers v1.8.10 to v1.9.0

see here for changes from slk_helpers v1.8.2 to v1.8.10

see here for changes from slk_helpers v1.7.4 to v1.8.2

see here for changes from slk_helpers v1.7.1 to v1.7.4

see here for changes from slk_helpers v1.6.0 to v1.7.1

see here for changes from slk_helpers v1.5.8 to v1.6.0

see here for changes from slk_helpers v1.5.7 to v1.5.7

see here for changes from slk_helpers v1.2.x to v1.5.7

Please have a look here for a detailed description of the new features and here for an incremental changelog.

Breaking change#

  • gfbt / group_files_by_tape: Regular expressions in the input path are not automatically evaluated anymore. Please set parameter --regex or --evaluate-regex-in-input when regular expressions are included.

  • Verify Jobs: A few rare types of incomplete files are not captured by verify jobs. Therefore, a postprocessing routine was added to result_verify_job to identify such files. This routine may increase the command’s runtime considerably if several thousand files are check. You may set the parameter --quick in order to skip the additional check – which we do not recommand.

Major changes#

  • gfbt / group_files_by_tape only evaluates regular expressions in the input if --regex or --evaluate-regex-in-input is set

  • gfbt / group_files_by_tape got countless new features

  • The new commands slk_helpers recall and retrieve are extended version of the slk recall and slk retrieve commands. They can be used much easier in automated workflows. slk_helpers retrieve starts no automatic recall but only retrieves files from the cache.

  • resource_id does the same as exists but accepts multiple resource paths as input

  • list_clone_file accepts multiple resource paths as input

  • result_verify_job goes through all files checked by the verify job and does additional file verification checks which the verify job does not

Details on new commands#

  • resource_id:
    • works like exists but …

    • accepts multiple resource paths as input (provided in the command call, via stdin or via a file --read-from-file <file>)

    • prints <resource path>: <resource id>` or ``<resource path>: not exists or <resource path>: problem accessing resource (+ throws error)

  • recall (development; might change behaviour; please coordinate with DKRZ support prior to usage):
    • starts recall job for provided resources and instantly returns StrongLink recall job id

    • accepts a list of resource paths or resource ids or one search id

    • resources or search id can be piped into the command (e.g.: cat file_list.txt | slk_helpers recall ...)

    • if -d/--destionation <dst>` is set, only files not present in ``dst are recalled (files compared based on size and mtime)

  • retrieve (development; might change behaviour; please coordinate with DKRZ support prior to usage):
    • starts retrieval of provided resources to the dst provided by -d/--destionation <dst>

    • accepts a list of resource paths or resource ids or one search id

    • resources or search id can be piped into the command (e.g.: cat file_list.txt | slk_helpers retrieve ...)

    • if -vv is set, detailed output per file is printed

    • files not stored in the cache are not retrieved and no recall is started for them

    • a file listing all resources which could not be retrieved can be returned via write-missing-to-file <output_file>

    • command --run-as-slurm-job-with-account <ACCOUNT> creates a SLURM job script for retrieval which will re-submit itself automatically until all files are back; if --dry-run is not set, script is directly submitted as SLURM job; does not start a recall but only copies files from the cache to the user;

  • recall_needed (development; might change behaviour; please coordinate with DKRZ support prior to usage):
    • same parameters as recall; like a slk_helpers recall --dry-run

    • checks whether a recall needs to be performed or not (does not check whether a recall is possible or not)

All fixes and new features of existing commands#

  • general bug fixes:
    • tapes which are not available anymore are ignored by most commands

    • removed debugging comments from previous versions that were forgotten

    • fixed a bug were a regular 0-byte file was not recognized as being available for retrieval

  • command checksum: fixed exit code

  • command job_report and result_verify_job
    • ignored non-existing files, in the past; now, they print them

    • StrongLink might shorten the path of files like /arch/blub/test.nc~/test.nc to /arch/blub~/test.nc; for each non-existing file in the output of these commands we check this case, now

    • result_verify_job identifies additional problematic files which are not recognized by verify jobs

  • commands group_files_by_tape / gfbt, new arguments:
    • --resource-ids: expect resource ids as input

    • -dst <dst>` / ``--destinationPath <dst>`: ignore files which exist already in ``dst

    • -ns: preserve original namespace in destinationPath

    • -wf1: “workflow 1” => shortcut for --details --count-tapes -ns --write-resource-id --destinationPath <dst>

    • -wrid / --write-resource-id:
      • write resource ids per tape to text files with names files_tape_<tape barcode>.txt,

      • creates file files_all.txt (all resource ids),

      • creates file tapes.txt (all tapes for which the first type of file are created),

      • creates file config.sh (parameters for watcher scripts);

      • possibly, creates files files_multipleTape.txt,

      • possibly, creates files files_notStored

      • possibly, creates files files_ignored

      • possibly, creates files files_cached

    • -ao / --append-output: when -wrid is set and target files already exist, append output to them (error otherwise)

    • -oo / --overwrite-output: when -wrid is set and target files already exist, append output to them (error otherwise)

    • regular expressions in the input are only evaluated when --regex / --evaluate-regex-in-input is set

    • removed argument -d from gfbt / group_files_by_tape because a user might expect it to be the short version of --destination although it is the long version of --details

  • commands list_clone_search and list_clone_file can print resource id instead of resource path (--print-resource-id)

  • command list_clone_file * can print a fifth timestamp when --print-more-timestamps is set * accepts multiple resource paths as input * can print resource ids instead of resource paths (--print-resource-ids) in the right most column * can read resource paths from a file --read-from-file <file> * can read resource paths from stdin (on empty input)

  • command resource_tape got parameter --print-tape-barcode-only

  • command size exits with an error if a file has an internal size mismatch which is not visible to the user (affected 15 files of 2 x 10^7 files; can only occur when the same file is archived multiple times in parallel to the same location)

Known issues#

  • list_search and list_clone_search may list already deleted files (not checked for performance reasons)