HSM News for Oct 2023#
pyslk release 1.9.5#
will be installed in module
python3/2023.01-gcc-11.2.0on Oct 17th
- major changes in pyslk 1.9.3, 1.9.4 and 1.9.5:
function pyslk.construct_dst_from_src()which accepts source file(s) and a destination root path is input and constructs the destination path of each source file if it was archived by
json_str2hsmwas not implemented before
improved type checking and corrected output types
extended deprecated warnings of functions
updated error types and error messages
major updates to the pyslk documentation (https://hsm-tools.gitlab-pages.dkrz.de/pyslk/index.html)
detailed changes: https://hsm-tools.gitlab-pages.dkrz.de/pyslk/changelog.html
next will be release 2.0.0; in Nov/Dec 2023 installed in
slk retrieval wrapper#
We provide a few wrapper scripts for
slk retrieve and
slk recall as part of the
slk module on Levante. The core wrapper script is called
slk_wrapper_recall_wait_retrieve. The argument
--help prints details on the usage:
$ slk_wrapper_recall_wait_retrieve --help
slk_wrapper_recall_wait_retrieve <account> <source_path> <destination_path> <suffix_logfile>
useful log files:
slk log file: ~/.slk/slk-cli.log
wrapper log file: rwr_log_<suffix_logfile>.log
<account>has to be a DKRZ project account with allocated compute time. Your account has to be allowed to run SLURM jobs on Levante.
<source_path>can be a search id, a path pointing to a namespace or a path pointing to a resource. The wrapper script automatically starts recursive recalls and retrievals. However, it does not split the files by tape. If you wish to combine this wrapper with
slk_helpers group_files_by_tapeplease have a look into this example.
<destination_path>destinatino path of the retrieval.
<suffix_logfile>: The script outmatically created a log file
rwr_log_<suffix_logfile>.loginto which relevant output from this script and from child scripts is written.
What does this script do? If the files do not need to be recalled but are stored in the HSM cache, a retrieval is directly started. Otherwise, it submits a SLURM job, which runs
slk recall. The ID of the StrongLink recall job is extracted and a new “waiter job” is submitted to SLURM which is delayed by one hour. After one hour this “waiter job” starts and checks the status of the recall job with the given ID. If the recall job …
… was successful, a retrieval job is started to copy the file(s) from HSM cache to the Lustre filesystem.
… failed, error information is printed to the log file and the script terminates.
… is still running or queued, the waiter job submits itself again with another hour of delay.
If the StrongLink system is under high load and many timeout occur, the slk-recall-wait-retrieve wrapper might fail in between.
This example on using the wrapper script in combination with