HSM tools: February 2025 updates#

New versions of the slk_helpers (1.13.2), of pyslk (2.2.0) and of the slk_wrappers (2.0.1) are available which contain several new features and bugfixes. The major changes and improvements are presented here. Details on minor changes can be found in the slk_helpers changelog and the pyslk changelog (pyslk changelog).

New retrieval workflow: testphase started#

We’re excited to announce new commands and scripts designed to simplify data retrieval, following a thorough testing phase with multiple users. We’re aware that large-scale usage could reveal unforeseen challenges, particularly with StrongLink. Therefore, we’ll be actively monitoring system performance and encourage users to provide feedback. This experimental workflow is a collaborative effort, and we’re committed to making it the best possible solution. We’re also working on additional tools to further enhance data transfer to and from the tape archive. If necessary, based on performance data and user feedback, we may adjust or even temporarily deactivate this experimental workflow to ensure system stability and optimal performance.

We provide to simplified workflows for file retrievals:

  • get less then five files

  • get an arbitrary number of files

The workflow (2) creates many files in order to organize the proper and efficient recall and retrieval of the requested files. This might be annoying if only one or two files should be recalled. Therefore, workflow (1) was set up which leaves considerably less files to be cleaned up.

Get few files

module load slk/3.3.91_h1.13.2_w2.0.1
file1=/arch/ab1234/...
file2=/arch/cd5678/...
destination=/work/ab1234/data/...
slurm_job_account=ab1234

## start recall job: copy from tape to HSM-cache
slk_helpers recall ${file1} ${file2} -d ${destination}
# job ID is retruned
# you can check the status of the job by
slk_helpers job_status <job_id>
# when the job failed, please run the same reclal command again.

## start retrieval job: copy from HSM-cache to tape as soon as files are back
##  this command can be run immediately after the previous recall command has been started
slk_helpers retrieve ${file1} ${file2} -d ${destination} --run-as-slurm-job-with-account ${slurm_job_account}
# slurm job is submitted; details are printed on how to stop the job an similar

Get arbitrary number of files

module load slk/3.3.91_h1.13.2_w2.0.1
folder1=/arch/ab1234/...
destination=/work/ab1234/data/...
slurm_job_account=ab1234

# create a tmp_folder for this process and change
# into it; many files for structuring the recall
# and retrieval will be created later
mkdir tmp_folder
cd tmp_folder

## generate file-to-tape-mapping
slk_helpers gfbt -R ${folder1} -wf1 ${destination} -v
# '-v' is useful to see where the command is
#       working if it take longer to finish

## start recall processes
start_recall_watcher.sh ${slurm_job_account}
# * check out the recall status in the recall.log file
# * one recall job will run per tape
# * max. 4 recalls will run in parallel

## start retrieval process
##  this command can be run immediately after the
##   previous recall command has been started
start_recall_watcher.sh ${slurm_job_account}
# * check out the recall status in the recall.log file
# * as soon as a file has been recalled, the retrieval
#       watcher will attempt to retrieve it

Changes in command group_files_by_tape#

The command group_files_by_tape / gfbt has been considerably extended to support the new experimental retrieval workflow. Additionally, the speed of the command execution has been considerably improved. In the past, the command ran at least one large search per execution. In some use cases, this caused waiting times of several minutes and, moreover, high load on the StrongLink system. To improve the situation for the user and the system stability, we deactivated the automatic evaluation of Regular Expressions in the input of gfbt which required to run a search. When you provide a Regular Expression as input to gfbt, please set the parameter --regex.

Improvements in Verify Jobs#

StrongLink provides so-called verify jobs which can be used to check whether files were properly transferred from the user to the HSM cache. The access to this feature is provided to you by the following command:

slk_helpers submit_verify_job <FILES>
# job id <JOB_ID> is returned

# wait a few minutes
slk_helpers result_verify_job <JOB_ID>
# list of problematic files

The verify jobs as provided to you do a quick verification of cached files. The same verification is automatically performed before files are written to tape. Files, which are not in the cache, cannot be checked.

Following a detailed evaluation of the StrongLink verification process in the previous months, it was realized that verify jobs do not detect certain problematic cases (see Verify Jobs: shortcomings). We added new routines to the command result_verify_job which perform additional file checks after collecting the results of the verify job. This might extend the run time of the the command considerably. If you are sure that the problematic cases are not relevant in your case (see Verify Jobs: shortcomings), you can deactivate the new checks by the parameter --quick.