improved retrieval workflow v01#

file version: 04 Feb 2025

current software versions: slk_helpers version 1.13.2; slk wrappers 2.0.1

quick start#

Please create a new directory and change into it. There will be many new text files created by the next commands.

module load slk/3.3.91_h1.13.2_w2.0.1
slk_helpers gfbt <source files> -wf1 <local destinationPath>
start_recall_watcher.sh <DKRZ project slurmJobAccount>
start_retrieve_watcher.sh <DKRZ project slurmJobAccount>

Check the retrieve.log and recall.log files. Check the tapes_error.txt and files_error.txt files and report issues to beratung@dkrz.de

When the recall and retrieve watchers stop/die/are-aborted, then you can resume the whole process by starting the start_*.sh scripts again. The gfbt command should not be run again under normal conditions. If you decide to run the gfbt command again, please clean up the working folder, first, or create a new folder to run the whole command-script-chain there.

A detailed example is given in the end.

new / extended commands#

Two new commands are provided:

  • new command:
    • slk_helpers retrieve

    • slk_helpers recall

  • extended existing command:
    • slk_helpers group_files_by_tape / slk_helpers gfbt

The slk_helpers retrieve command copies only files from the cache to the user and the slk_helpers recall command only copies files from the tape to the cache. Thus, if a file should be retrieved from tape to the Lustre filesystem, the user has to run slk_helper recall, first, and, afterwards, slk_helpers retrieve. On the first look, this might make retrievals more complicated in contrast to using the old slk retrieve. However, as will explained below, these new commands can be used much easier in automated workflows.

The improved version of the command gfbt generates certain files in a local directory which are quite useful for automated retrieval workflows using the two new slk_helpers commands from above. This command is needed when more than a handful of files need to be retrieved. Details on these certain files are given further below. There were multiple new parameters added to fine-control gfbt and the file creation. To simplify the usage, one central new flag -wf1 <retrieval destinationPath> was introduced (long version: --retrieval-workflow-1 <...>).

If gfbt -wf1 <...> is run in a local folder were these certain files or at least one of them already exist, the command will exit with an error. It is also possible to overwrite old files (--overwrite-output) or to extended existing files (--append-output).

new scripts#

four new scripts are provided for recalling and retrieving data. Two scripts are the actual work horses:

  • recall_watcher.sh

  • retrieve_watcher.sh

and two scripts just start these work horses:

  • start_recall_watcher.sh

  • start_retrieve_watcher.sh

Both start_* scripts check some preconditions and, then, submit their respective work-horse-script as a SLURM job. The work-horse-scripts will submit themselves again delayed when they finish. They write logging information into log files recall.log and retrieve.log and their SLURM logs are in a subfolder logs. These scripts cannot start when slk_helpers gfbt -wf1 <...> has not been previously run in the same local folder from which these scripts are started because these scripts need files created by gfbt. A central required file is config.sh but there are more required files.

new workflow#

The general idea of the workflow is that one recall is run per tape but only one retrieval is run for all files.

Why one recall per tape?#

First, this makes the access of each tape most efficient because each tape has to be accessed only once.

Second, this does not overload the StrongLink system with multiple requests for one tape or multiple requests targeting too many tapes are ones. Details on why this is bad are given here: https://docs.dkrz.de/doc/datastorage/hsm/retrievals.html#aggregate-file-retrievals (TODO).

Why cannot we do this with the existing slk commands?#

Keys points:

  • slk recall and slk retrieve do not accept file list but only a single file, a namespace or a search id. File lists needed to be provided via searches / search ids in the past which might be slow when long file lists should be provided. slk_helpers recall and slk_helpers retrieve accept a plain file list and a user can also pipe files into the commands.

  • slk_helpers recall and slk_helpers retrieve accept resource ids as input in addition to resource paths which considerably shortens calls of these commands.

  • slk recall and slk retrieve are made for interactive use. Non-interactive use in scripts is very complicated and error-prone.

  • slk retrieve automatically starts recalls for files which are not in the cache. This is practical, when only one file is needed but might cause issues when a large dataset is requested. It does not allow to “get everything what we need and what is currently in the cache”.

  • slk_helpers recall and slk_helpers retrieve offer parameters to tell these commands that they only should get files back which are not already existing in a user-provided destinationPath folder.

  • StrongLink has an internal queue for recall jobs. slk recall submits a job, prints the job id to the slk log (~/.slk/slk-cli.log) and waits until the recall job is finished. This is very inefficient because it needs e.g. a SLURM job to run all the time a recall job runs. The same states for slk retrieve. Instead, the new slk_helpers recall works like sbatch: submits a recall job, prints the job id to the terminal quits. A user (or script) can then check via slk_helpers job_status whether the job is still running or not.

How does the new workflow look like#

First: DKRZ provides two scripts which do most of the steps described below automatically. A user only needs to run ``slk_helpers gfbt`` and, then, start the two starter scripts.

The users runs first slk_helpers group_files_by_tape:

slk_helpers group_files_by_tape <source files> -wf1 <local destinationPath> -v

When -v is set, verbose output is printed. The command might take a bit longer in some situations. Having the verbose option activated might help to keep calm and patient.

The command create multiple files.

  • config.sh: various environment variables used for configuring the recall and retrieve watcher scripts

  • files_all.txt: list of all files to be retrieved; used by retrieve_watcher.sh

  • files_cached.txt: list of all cached files; not explicitely used

  • files_multipleTapes.txt: list of all HPSS files split amongst two tapes; require special treatment; used by recall_watcher.sh after all tapes in tapes.txt have been processed

  • files_notStored.txt: list of files which exist as metadata entries but do not have a real file attached; there were a few files archived by HPSS for which this was the case

  • files_ignored.txt: list of files which are ignored for the retrieval because they exist already in the local destinationPath

  • files_tape_<TAPE_BARCODE>.txt: list of all files which should be recalled from the tape with the given TAPE_BARCODE; used by recall_watcher.sh

  • tapes.txt: list of all tapes from which data should be recall; used by recall_watcher.sh; a corresponding file files_tape_<TAPE_BARCODE>.txt has to exist for a given tape barcode in this list

Now, we need to run slk_helpers recall for each tape. First, we get a list of tapes:

$ cat tapes.txt
<tapeBarcode1>
<tapeBarcode2>
<tapeBarcode3>
<tapeBarcode4>
...

Second, we should check whether the tapes are actually avaiable:

$ slk_helpers tape_status <tapeBarcode1>
AVAILABLE

$ slk_helpers tape_status <tapeBarcode2>
BLOCKED

$ slk_helpers tape_status <tapeBarcode3>
ERRORSTATE

$ slk_helpers tape_status <tapeBarcode4>
AVAILABLE

Data can be requested from AVAILABLE tapes.

BLOCKED tapes are currently used by other jobs. In order to prevent StrongLink from becoming slow, the new slk_helpers recall allows only one active job per tape (see for details: https://docs.dkrz.de/doc/datastorage/hsm/retrievals.html#aggregate-file-retrievals). Recall jobs submitted to this tape will fail.

Please inform us when one of your tapes is in an ERRORSTATE. Commonly, this points to a set warning flag in the metadata or to an inconsistency in the tape metadata. Some of these error states can be solved by the StrongLink admins at DKRZ and others have to be reset by the StrongLink support. Although DKRZ staff resets such tape states regularly or requests it with the StrongLink supprt, some tapes might be overlooked because StrongLink does not allow to search for all tapes in such a state.

Third, we actually run slk_helpers recall for AVAILABLE tapes:

$ cat files_tape_<tapeBarcode1>.txt | slk_helpers recall --resource-ids
<jobIdA>

$ cat files_tape_<tapeBarcode4>.txt | slk_helpers recall --resource-ids
<jobIdB>

...

There should not be running more than four recalls at once. For certain tape types, only two should run in parallel because the available number of tape drives is very low. A user could check the running state of a job via slk_helpers job_status:

$ slk_helpers job_status <jobIdA>
SUCCESSFUL

$ slk_helpers job_status <jobIdB>
PROCESSING

When a job is FAILED, it should be resubmitted. If it fails multiple times, the DKRZ Beratung should be contacted.

In parallel to running recall processes, the retrieval can run:

cat files_att.txt | slk_helpers recall --resource-ids -ns --destinationPath <local destinationPath> -vv

Since -vv is set, double verbose output will be printed. This command will retrieve all files which are already in the cache and not available in the local destinationPath. We recommand setting -ns which will reconstruct the full archival path in local destinationPath. This flag is recommanded. It is implicitely assumed by slk_helpers gfbt -wf1 <...>.

This command can be run repeatedly until all requested files have been retrieved. The command returns exit code of 2 if a general error occurs and exit code 3 if a timeout occurs. As long as at least one file is not cached yet, the commdn returns exit code 1. This makes the command easily be used in a script / automated workflow:

  • exit code 0: all files successfully retrieve or already in local destinationPath

  • exit code 1: re-run slk_helpers retrieve

  • exit code 2: stop retrieving and check the error message

  • exit code 3: wait a bit and submit slk_helpers retrieve again.

usage example#

We want to get CCLM forcing made from ERA5 data for the years 1973, 1974 and 1975. We are in project ab1234 and to to retrieve the data to /work/ab1234/forcing. Please create a new directory and change into it. There will be many new text files created by the next commands.

Zero, load the appropriate slk module:

module load slk/3.3.91_h1.13.2_w2.0.1

First, we run the gfbt command for the years 1974 and 1975 only because we forgot that we also need 1974:

$ slk_helpers gfbt -R /arch/pd1309/forcings/reanalyses/ERA5/year1974 /arch/pd1309/forcings/reanalyses/ERA5/year1975 -wf1 /work/ab1234/forcing -v
# command line output is given further low for the interested reader

The output shows a a nice summary from which tapes how many files need to be recalled. We realized that 1973 is missing and simply let gfbt append the information to the generated files (--apend-output):

$ slk_helpers gfbt -R /arch/pd1309/forcings/reanalyses/ERA5/year1973 -wf1 /work/ab1234/forcing --append-output -v
# command line output is given further low for the interested reader

Please notify us when you see tapes in ERRORSTATE.

Now, there should be multiple new files in the current directory. Please remain in this directory and proceed.

Next, we submit the watcher scripts

$ start_recall_watcher.sh ab1234
successfully submitted recall watcher job with SLURM job id '1234567'

$ start_retrieve_watcher.sh ab1234
successfully submitted retrieve watcher job with SLURM job id '1234568'

Check the retrieve.log and recall.log files. Check the tapes_error.txt and files_error.txt files and report issues to beratung@dkrz.de

Thats it!

Command line output first gfbt command:

$ slk_helpers gfbt -R /arch/pd1309/forcings/reanalyses/ERA5/year1974 /arch/pd1309/forcings/reanalyses/ERA5/year1975 -wf1 /work/ab1234/forcing -v
progress: generating file grouping based on search id 826348 in preparation
progress: generating file grouping based on search id 826348 (for up to 190 files) started
collection storage information for search id 826348 started
Number of pages with up to 1000 resources per page to iterate: 1
collection storage information for search id 826348 finished
creating and returning object to host resource storage information
progress: generating file grouping based on search id 826348 (for up to 190 files) finished
progress: getting tape infos for 51 tapes started
progress: getting tape infos for 51 tapes finished
progress: extracting tape stati for 51 tapes started
progress: extracting tape stati for 51 tapes finished
------------------------------------------------------------------------------
progress: updating tape infos for 51 tapes started
progress: updating tape infos for 51 tapes finished
progress: extracting tape stati for 51 tapes started
progress: extracting tape stati for 51 tapes finished
------------------------------------------------------------------------------
    cached (AVAILABLE  ): 23
M24350M8 (BLOCKED    ): 2
M24365M8 (AVAILABLE  ): 3
M24366M8 (AVAILABLE  ): 2
M21306M8 (AVAILABLE  ): 2
M21307M8 (AVAILABLE  ): 1
M21314M8 (ERRORSTATE): 4
M21315M8 (AVAILABLE  ): 1
M24390M8 (AVAILABLE  ): 1
M24391M8 (AVAILABLE  ): 1
M24280M8 (AVAILABLE  ): 3
M21336M8 (AVAILABLE  ): 2
M21341M8 (AVAILABLE  ): 2
M21344M8 (AVAILABLE  ): 1
M21345M8 (AVAILABLE  ): 5
M21342M8 (BLOCKED    ): 8
M22372M8 (BLOCKED    ): 1
M21348M8 (BLOCKED    ): 3
M21349M8 (BLOCKED    ): 3
M21346M8 (AVAILABLE  ): 3
M21347M8 (AVAILABLE  ): 2
M21350M8 (AVAILABLE  ): 1
M24294M8 (AVAILABLE  ): 1
M24295M8 (AVAILABLE  ): 1
M24173M8 (AVAILABLE  ): 3
M22509M8 (AVAILABLE  ): 1
M21360M8 (AVAILABLE  ): 1
M21358M8 (AVAILABLE  ): 1
M21362M8 (AVAILABLE  ): 5
M21363M8 (AVAILABLE  ): 3
M32623M8 (AVAILABLE  ): 7
M21369M8 (AVAILABLE  ): 1
M32621M8 (AVAILABLE  ): 7
M32626M8 (AVAILABLE  ): 11
M32627M8 (AVAILABLE  ): 10
M22395M8 (AVAILABLE  ): 4
M32630M8 (ERRORSTATE): 3
M24320M8 (AVAILABLE  ): 1
M24321M8 (AVAILABLE  ): 3
M32631M8 (AVAILABLE  ): 7
M22655M8 (AVAILABLE  ): 3
M24324M8 (AVAILABLE  ): 1
M32635M8 (AVAILABLE  ): 10
M24325M8 (AVAILABLE  ): 1
M32632M8 (AVAILABLE  ): 8
M22659M8 (AVAILABLE  ): 3
M32638M8 (AVAILABLE  ): 8
M21385M8 (AVAILABLE  ): 1
M32636M8 (AVAILABLE  ): 4
M24202M8 (AVAILABLE  ): 1
M32640M8 (ERRORSTATE): 4
M32377M8 (AVAILABLE  ): 2
------------------------------------------------------------------------------

Command line output second gfbt command:

progress: generating file grouping based on search id 826349 in preparation
progress: generating file grouping based on search id 826349 (for up to 95 files) started
collection storage information for search id 826349 started
Number of pages with up to 1000 resources per page to iterate: 1
collection storage information for search id 826349 finished
creating and returning object to host resource storage information
progress: generating file grouping based on search id 826349 (for up to 95 files) finished
progress: getting tape infos for 43 tapes started
progress: getting tape infos for 43 tapes finished
progress: extracting tape stati for 43 tapes started
progress: extracting tape stati for 43 tapes finished
------------------------------------------------------------------------------
progress: updating tape infos for 43 tapes started
progress: updating tape infos for 43 tapes finished
progress: extracting tape stati for 43 tapes started
progress: extracting tape stati for 43 tapes finished
------------------------------------------------------------------------------
M24277M8 (AVAILABLE  ): 2
M24339M8 (AVAILABLE  ): 1
M24280M8 (AVAILABLE  ): 3
M21336M8 (AVAILABLE  ): 1
M24278M8 (AVAILABLE  ): 5
M22422M8 (AVAILABLE  ): 1
M24279M8 (AVAILABLE  ): 1
M21340M8 (AVAILABLE  ): 1
M24221M8 (AVAILABLE  ): 1
M21345M8 (AVAILABLE  ): 2
M24350M8 (BLOCKED    ): 1
M21342M8 (BLOCKED    ): 2
M24351M8 (AVAILABLE  ): 1
M24223M8 (AVAILABLE  ): 1
M22372M8 (BLOCKED    ): 5
M21349M8 (AVAILABLE  ): 4
M21346M8 (AVAILABLE  ): 2
M21347M8 (AVAILABLE  ): 2
M21350M8 (AVAILABLE  ): 1
M24294M8 (AVAILABLE  ): 1
M32016M8 (AVAILABLE  ): 1
M24366M8 (AVAILABLE  ): 1
M21363M8 (AVAILABLE  ): 3
M32623M8 (AVAILABLE  ): 5
M21305M8 (AVAILABLE  ): 1
M32621M8 (AVAILABLE  ): 3
M32626M8 (AVAILABLE  ): 5
M32627M8 (AVAILABLE  ): 5
M24379M8 (AVAILABLE  ): 1
M22395M8 (AVAILABLE  ): 2
M32630M8 (ERRORSTATE): 1
M32631M8 (AVAILABLE  ): 5
M22655M8 (AVAILABLE  ): 3
M32635M8 (AVAILABLE  ): 6
M21314M8 (ERRORSTATE): 2
M32632M8 (AVAILABLE  ): 1
M32638M8 (AVAILABLE  ): 4
M24390M8 (AVAILABLE  ): 1
M32636M8 (AVAILABLE  ): 2
M24391M8 (AVAILABLE  ): 1
M21322M8 (AVAILABLE  ): 1
M32640M8 (ERRORSTATE): 2
M32119M8 (AVAILABLE  ): 1
------------------------------------------------------------------------------