Changelog: slk_helpers v1.13.2#
Update from slk_helpers v1.12.10 to 1.13.2
see here for changes from slk_helpers v1.10.2 to 1.12.10
see here for changes from slk_helpers v1.9.7 to v1.10.2
see here for changes from slk_helpers v1.9.5 to v1.9.9
see here for changes from slk_helpers v1.9.3 to v1.9.5
see here for changes from slk_helpers v1.9.0 to v1.9.3
see here for changes from slk_helpers v1.8.10 to v1.9.0
see here for changes from slk_helpers v1.8.2 to v1.8.10
see here for changes from slk_helpers v1.7.4 to v1.8.2
see here for changes from slk_helpers v1.7.1 to v1.7.4
see here for changes from slk_helpers v1.6.0 to v1.7.1
see here for changes from slk_helpers v1.5.8 to v1.6.0
see here for changes from slk_helpers v1.5.7 to v1.5.7
see here for changes from slk_helpers v1.2.x to v1.5.7
Please have a look here for a detailed description of the new features and here for an incremental changelog.
Breaking change#
gfbt
/group_files_by_tape
: Regular expressions in the input path are not automatically evaluated anymore. Please set parameter--regex
or--evaluate-regex-in-input
when regular expressions are included.Verify Jobs: A few rare types of incomplete files are not captured by verify jobs. Therefore, a postprocessing routine was added to
result_verify_job
to identify such files. This routine may increase the command’s runtime considerably if several thousand files are check. You may set the parameter--quick
in order to skip the additional check – which we do not recommand.
Major changes#
gfbt
/group_files_by_tape
only evaluates regular expressions in the input if--regex
or--evaluate-regex-in-input
is setgfbt
/group_files_by_tape
got countless new featuresThe new commands
slk_helpers recall
andretrieve
are extended version of theslk recall
andslk retrieve
commands. They can be used much easier in automated workflows.slk_helpers retrieve
starts no automatic recall but only retrieves files from the cache.resource_id
does the same asexists
but accepts multiple resource paths as inputlist_clone_file
accepts multiple resource paths as inputresult_verify_job
goes through all files checked by the verify job and does additional file verification checks which the verify job does not
Details on new commands#
resource_id
:works like
exists
but …accepts multiple resource paths as input (provided in the command call, via
stdin
or via a file--read-from-file <file>
)prints
<resource path>: <resource id>` or ``<resource path>: not exists
or<resource path>: problem accessing resource
(+ throws error)
recall
(development; might change behaviour; please coordinate with DKRZ support prior to usage):starts recall job for provided resources and instantly returns StrongLink recall job id
accepts a list of resource paths or resource ids or one search id
resources or search id can be piped into the command (
e.g.: cat file_list.txt | slk_helpers recall ...
)if
-d/--destionation <dst>` is set, only files not present in ``dst
are recalled (files compared based on size andmtime
)
retrieve
(development; might change behaviour; please coordinate with DKRZ support prior to usage):starts retrieval of provided resources to the
dst
provided by-d/--destionation <dst>
accepts a list of resource paths or resource ids or one search id
resources or search id can be piped into the command (
e.g.: cat file_list.txt | slk_helpers retrieve ...
)if
-vv
is set, detailed output per file is printedfiles not stored in the cache are not retrieved and no recall is started for them
a file listing all resources which could not be retrieved can be returned via
write-missing-to-file <output_file>
command
--run-as-slurm-job-with-account <ACCOUNT>
creates a SLURM job script for retrieval which will re-submit itself automatically until all files are back; if--dry-run
is not set, script is directly submitted as SLURM job; does not start a recall but only copies files from the cache to the user;
recall_needed
(development; might change behaviour; please coordinate with DKRZ support prior to usage):same parameters as
recall
; like aslk_helpers recall --dry-run
checks whether a recall needs to be performed or not (does not check whether a recall is possible or not)
All fixes and new features of existing commands#
- general bug fixes:
tapes which are not available anymore are ignored by most commands
removed debugging comments from previous versions that were forgotten
fixed a bug were a regular 0-byte file was not recognized as being available for retrieval
command
checksum
: fixed exit code- command
job_report
andresult_verify_job
ignored non-existing files, in the past; now, they print them
StrongLink might shorten the path of files like
/arch/blub/test.nc~/test.nc
to/arch/blub~/test.nc
; for each non-existing file in the output of these commands we check this case, nowresult_verify_job
identifies additional problematic files which are not recognized by verify jobs
- command
- commands
group_files_by_tape
/gfbt
, new arguments: --resource-ids
: expect resource ids as input-dst <dst>` / ``--destinationPath <dst>`: ignore files which exist already in ``dst
-ns
: preserve original namespace in destinationPath-wf1
: “workflow 1” => shortcut for--details --count-tapes -ns --write-resource-id --destinationPath <dst>
-wrid
/--write-resource-id
:write resource ids per tape to text files with names
files_tape_<tape barcode>.txt
,creates file
files_all.txt
(all resource ids),creates file
tapes.txt
(all tapes for which the first type of file are created),creates file
config.sh
(parameters for watcher scripts);possibly, creates files
files_multipleTape.txt
,possibly, creates files
files_notStored
possibly, creates files
files_ignored
possibly, creates files
files_cached
-ao
/--append-output
: when-wrid
is set and target files already exist, append output to them (error otherwise)-oo
/--overwrite-output
: when-wrid
is set and target files already exist, append output to them (error otherwise)regular expressions in the input are only evaluated when
--regex
/--evaluate-regex-in-input
is setremoved argument
-d
fromgfbt
/group_files_by_tape
because a user might expect it to be the short version of--destination
although it is the long version of--details
- commands
commands
list_clone_search
andlist_clone_file
can print resource id instead of resource path (--print-resource-id
)command
list_clone_file
* can print a fifth timestamp when--print-more-timestamps
is set * accepts multiple resource paths as input * can print resource ids instead of resource paths (--print-resource-ids
) in the right most column * can read resource paths from a file--read-from-file <file>
* can read resource paths from stdin (on empty input)command
resource_tape
got parameter--print-tape-barcode-only
command
size
exits with an error if a file has an internal size mismatch which is not visible to the user (affected 15 files of 2 x 10^7 files; can only occur when the same file is archived multiple times in parallel to the same location)
Known issues#
list_search
andlist_clone_search
may list already deleted files (not checked for performance reasons)