Archivals to tape#
file version: 20 Dec 2024
current software versions: slk version 3.3.91; slk_helpers version 1.13.1; slk wrappers 1.2.2
run slk archive#
slk archive
is available on all Levante nodes. Basic examples slk archive
calls are:
# archive one file, absolute path
$ slk archive /work/bm0146/k204221/ /arch/ab01234/c567890/my_data_1/
# archive one file, relative path
$ slk archive /arch/ab01234/c567890/my_data_3/
# archive folder recursively, absolute path
$ slk archive -R /work/bm0146/k204221/some_folder /arch/ab01234/c567890/my_data_4/
# archive folder recursively, relative path, skip hidden files and folders
$ slk archive -x -R some_folder /arch/ab01234/c567890/my_data_5/
# archive multiple files
$ slk archive /arch/ab01234/c567890/my_data_6/
# archive multiple files using wildcards
$ slk archive file_?.nc /arch/ab01234/c567890/my_data_7/
$ slk archive year200[0123].nc /arch/ab01234/c567890/my_data_8/
slk archive
may need up to 6 GB
of memory (incl. overhead). If you want to archive a file of 1 GB
size, you are welcome to do this on the Levante login nodes. If you wnat to archive multiple GB
of data, please run slk archive
on the interactive
or shared
nodes and allocate 6 GB
of memory. When your slk
is killed with a message similar to the following one, too less memory was allocated.
/sw/[...]/bin/slk: line 16: [...] Killed
If you want to run multiple slk archive``s in parallel on one node, please allocate at least ``5 GB
per call of slk archive
. However, we recommend to run only one slk archive
– but with many files at once.
Interactive sessions are started with salloc
$ salloc --mem=6GB --partition=interactive --account=<YOUR_PROJECT_ACCOUNT>
$ slk archive /arch/ab0123/c456789
$ exit
See also Run slk in the “interactive” partition and Data Processing on Levante for details.
Different batch scripts for archiving data are provided at the bottom of this document. The script
(section archive multiple files) is the most simple one and will cover most use cases.
# first, adapt source files and target
# then submit:
$ sbatch --account=<YOUR_PROJECT_ACCOUNT> ./
slk archive
does not always finish successfully. Please check, whether it does. Three ways how to do this are described in section check if archival command was successful below.
If slk archive
fails, please run the same command again. Missing and incomplete existing files are archived. Complete existing files are skipped. If slk archive
fails repeatedly, we recommend running it with -vv
and checking the log file for error messages or Java exceptions.
Please consider running slk archive
with -x
to prevent the archival of hidden files and folders (details). Hidden folders such as .git
and .svn
might contain many small files and hidden files may contain login information or similar.
file size#
We recommend a size between 10 GB
and 200 GB
per archived file. We strongly recommend not to archive files below 1 GB
. Each archived file comsumes at least 1 GB
of tape quota. Please pack smaller files into tar balls prior to their archival. packems
supports you in this process (see below).
lower limit: It is very inefficient to retrieve a large number of small files from tape compared to a low number of large files of the same total size. This is because on average more tapes have to be read and more spooling has to be done per tape when more files are retrieved. The tape health is reduced and it takes more time to get all small files back.
upper limit: Due to the system setup of our current HSM system, the transfer speed between tape drives and HSM-cache decreases when files are larger than
200 GB
to250 GB
. This might change in future.
Retrieving 100 files of 100 MB
size takes much longer than retrieving one file of 10 GB
size. Additionally, the lifetime of the tape(s) is considerably reduced. When a file is read from tape, the tape drive spools the tape to the start of the file, stops, reads the file with increasing speed until top speed of 300 MB/s
is reached and stops towards the end of this file. In this case. this operation is repeated up to 100 times, which stresses the tape and takes time. If the files are stored on multiple tapes, the user has to wait additional time for the tape being transported from their shelves to the tape drives.
When you run slk archive -R
to archive a folder with model output of nice size, there might be hidden files and/or folders which you accidentally archive as well. These hiddens folders, such as .git
and .svn
, might contain many small files. Please set -x
to ignore these files (details).
Please do not archive more than 3 TB
with one call of slk archive
. There is a high probability that the connection to StrongLink is interrupted if you try to do so. The probability for connection interruptions increases with increasing size of the archival. There will be printed no error message to the terminal but only to the slk
log (~/.slk/slk-cli.log
2022-11-24 11:16:22 INFO Executing command: "archive -R /work/ab1234/c567890/much_data /arch/zy0987/c567890/target"
2022-11-24 11:18:25 ERROR Unexpected exception unexpected end of stream on
... 16 more
2022-11-24 11:18:25 INFO
Archive report
Status: incomplete
Total files uploaded: 0/85083 files [0B/20.3T]
Please search for unexpected end of stream on
to find such events.
Please do not run more than two or three slk archive
in parallel because running more than that for a long time might cause a hight load on multiple StrongLink nodes. This would caused slow system response and higher probility for connection timeouts for all users.
If you need to archive more than 50 TB
at once, please contact us in advance via .
pack files#
Files below 1 GB size
should be packed into tar balls or zip files of a maximum size of 200 GB
. We recommend to use uncompressed tar balls.
You can create the tar balls manually or use packems
to do the job for you. A folder or a list of files are provided as input to packems
. packems
automatically fills the provided files into one or multiple tar balls of a maximum size of 100 GB
. In a next step, it copies the new tar balls into the tape archive. Additionally, packems
creates and archives a file INDEX.txt
, which contains a list of all packed files and their tar ball. packems
can do all tasks in one step but we recommend to run packing and archiving separately.
Please do not run packems
on the Levante login nodes but on shared
or interactive
nodes as described above for slk archive
. Because packems
uses slk archive
for archival, sufficient memory as to be allocated.
packems: basic archival#
# pack data with packems
# -d ... => local tmp destination of the tar balls
# -S ... => destination of the tar balls in the tape archive
# -o data_a => prefix of the tar ball names
# --no-archive => only pack and no archival yet
# /work/b.../a/0 => folder to pack recursively
$ packems \
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
--no-archive \
# looks what is there
$ ls
data_a_001.tar data_a_002.tar data_a_003.tar ...
# archive the tar ball and create index file
# --archive-only => do only archive data
$ packems \
-d /scratch/k/k204221/packed \
-S /dkrz_test/techtalk/003 \
-o data_a \
--archive-only \
packems: change tar ball size#
Set the maximume size of the tar balls to 50 GB
# pack data with packems
# -t ... => target size (a few tar balls might slightly exceed the value)
# -m ... => hard maximum at least if the source files are large
$ packems \
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
-t 30 -m 50
--no-archive \
packems: list of files as input#
We want to archive all *.nc
files from /work/bm0146/k204221/archive_this
. For this purpose, we run find ...
and pipe its output into packems
# pack data with packems
# -i - => read input from ``stdin``
$ find /work/bm0146/k204221/archive_this -type f -name *.nc | \
packems \
-i -
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
-t 30 -m 50
--no-archive \
We might also pipe the content of a file into packems
# pack data with packems
# -i - => read input from ``stdin``
$ cat file_list.txt | \
packems \
-i -
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
-t 30 -m 50
--no-archive \
owner, group and permissions#
The permissions of the original file are transfered to the archived file. The permissions can modified by the owner with this command
slk chmod <PERMISSIONS> /arch/ab1234/
The owner of the archived file is the archiving user. If files should be handed over to another user, please contact .
The group of the archived file is the default group of the user. Thus, the group is not adapted to the target namespace. The owner of files can change their group with this command
slk group <GROUP> /arch/ab1234/
Skip and ignore files#
slk archive
automatically skips archiving a file when
a file with the same name already exist in the destination location,
both files are equal in size and
both files have the same modification time.
“modification time” means the mtime
timestamp of the file and not the archival or modification time in StrongLink.
Skipping a file is considered as success/not-failed. With this respect, slk archive
works the same as rsync
If the parameter -x
is set, slk archive
ignores all hidden files and directories. A file or directory is considered as hidden when its name starts with a dot. E.g. .git
, .ipynb_checkpoints
and .ssh
folders or .gitignore
and .config
When you run slk archive
with -vv
, then skipped files are listed but ignored files are not listed.
check if archival command was successful#
There are three ways on how to check
look into the textual output of
slk archive
,capture the exit code
: successful archival)look into the slk log file:
slk archive
may skip the archival of files. Skipping a file is considered as success/not-failed.
check success: evaluate text output#
slk archive
Non-recursive Archive completed
if the archival of all files was successful andNon-recursive Archive failed
if the archival of at least one file failed.
Skipped files are not considered as failed.
In the terminal it will look like this:
# archive a new file
$ slk archive /arch/ab0123/c456789
Non-recursive Archive completed
# archive a file which already exists and will be skipped
$ slk archive /arch/ab0123/c456789
Non-recursive Archive completed
# archival will fail because no write permissions in destination location
$ slk archive /arch/no_permissions
Non-recursive Archive failed
You can print the archival status of each file by appending the flag -vv
. In this case, you see whether a file was archived successfully (SUCCESS
) or was skipped (SKIPPED
# archive a new file
$ slk archive /arch/ab0123/c456789 -vv SUCCESSFUL
Non-recursive Archive completed
# archive a file which already exists and will be skipped
$ slk archive /arch/ab0123/c456789 -vv SKIPPED
Non-recursive Archive completed
# archival will fail because no write permissions in destination location
$ slk archive /arch/no_permissions -vv FAILED
Non-recursive Archive failed
slk archive
prints no file status if the source file does not exist – although -vv
is set.
# archival will fail because source file does not exist
$ slk archive /arch/ab0123/c456789 -vv
Non-recursive Archive failed
check success: capture exist code#
Each command returns an exit code when it ends. The exit code is not visible but the exit code of the most recent command is stored in the variable $?
. An exit code of 0
indicates success. Exit codes >0
indicate a failure or have a special meaning depending on the command.
slk archive
returns the exist code 0
on success and 1
on failure:
$ slk archive /arch/ab0123/c456789
Non-recursive Archive completed
$ echo $?
$ slk archive /arch/ab0123/c456789
Non-recursive Archive failed
$ echo $?
All slk
commands only return 0
and 1
. In contrast, the slk_helpers
return 0
, 1
, 2
and 3
. The exact meaning differs from command to command. In the case of slk_helpers exists
, the 0
means yes
and 1
means no
$ slk_helpers exists /.../ > /dev/null 2>&1
$ echo $?
$ slk_helpers exists /.../ > /dev/null 2>&1
$ echo $?
$ slk_helpers exists /.../ > /dev/null 2>&1
$ echo $?
Exit codes are very useful in bash scripts:
slk archive /arch/ab0123/c456789
if [ $exit_code -eq 0 ]; then
echo "archival successful; first try"
# do some more stuff ...
>&2 echo "error occurred during archival; wait and retry"
sleep 10
slk archive /arch/ab0123/c456789
if [ $exit_code -eq 0 ]; then
echo "archival successful; second try"
# do some more stuff ...
>&2 echo "error occurred during archival; failed twice; exiting"
exit 1
check success: slk log file#
slk archive
only prints sparse information to the terminal. Most error messages and an archival report are printed to the slk
log file in ~/.slk/slk-cli.log
. Please give not other user read permissions to ~/.slk
because your slk
login token is also stored in that folder.
A successful archival looks like this in the log:
2024-06-12 11:10:53 197384 INFO Executing command: "archive ..."
2024-06-12 11:11:00 197384 INFO Non-recursive Archive completed
Archive report
Status: success
Total files uploaded: 1/1 files [3B/3B]
A failed archival might look similar to this in the log:
2024-06-12 11:17:01 205127 INFO Executing command: "archive ..."
2024-06-12 11:17:02 205127 ERROR Unexpected exception
java.nio.file.NoSuchFileException: file_20.txt
Archive report
Status: incomplete
Total files uploaded: 0/0 files [0B/0B]
Another failed archival might look like this in the log:
2023-03-25 01:34:25 xU22 97259 INFO Executing command: "archive ..."
2023-03-25 01:34:27 xU22 97259 ERROR No active nodes. Shutting down...
2023-03-25 01:34:27 xU22 97259 ERROR Failed to upload resource: [...]
GNS Path: [...]
Error: Code: 500, Reason: CONNECTION_ERROR, Message: Cannot connect to websocket, Detailed Message: Cannot connect to websocket
2023-03-25 01:34:29 xU22 97259 INFO Non-recursive Archive failed
Archive report
Status: incomplete
Total files uploaded: 0/1 files [0B/1.5K]
Total files failed: 1/1 files [0B/1.5K]
Connection Error: 1
slk archive failed#
What to do when slk archive failed? If you are in a hurry you can try the quick solution. However, if you have a bit more time, it might be good to find out why slk archive
failed. Depending on situation, the quick solution might be appropriate (most situations) or not.
When slk archive
fails, one or more files will probably be flagged as partial file
. The important facts on such files are:
A file which is not flagged as
partial file
has been archived completely.A file which is flagged as
partial file
may be an incomplete or a complete file.slk list
does not reliably highlight files flagged aspartial file
slk_helpers has_no_flag_partial -v
reliably lists all files flagged aspartial file
a normal user cannot remove a
partial file
flag from a completely archived file
quick solution#
If the archival was interrupted, please run the same call of slk archive
a second time. The slk archive
will only transfer those files, which
have not already been archived,
have only been partly archived (internally flagged as
partial file
) orhave been modified since the first archival (see skip rules).
You can run slk archive
repeatedly until it success. Afterwards, please check for files flagged as partial
and notify us via to remove the (false) flags (details). Flagged files are blocked for retrieval.
Find out why archival failed#
You will find the most common reasons for slk archive
to fail in the table below. If you experiance other reasons for failure, please notify us so that we can extend this table.
reason for failure |
solution |
manually killed by the user (e.g. via CTRL + C) |
re-run same |
broken ssh connection |
re-run same |
timeout of a SLURM job |
re-run same |
archival of a large amount of
data ( |
by the operating system (e.g. allowed memory exceeded). |
connection timeout to StrongLink |
no permissions to write into destination path |
obtain permissions |
source file(s) does not exist |
check source files or command |
partial file flag#
Files which are incompletely archived are flagged as partial file
. However, also completely archived files might be flagged as partial file
when slk archive
is interrupted directly after these files were completely archived.
slk list
appends (partial file)
in most situations when a file is flagged as partial file
. However, when permissions, ownership, group, path or name of the flagged file were changed, the info (partial file)
is not printed by slk list
. Therefore, please do not use slk list
to determine whether a file is flagged as partial file
or not. Please use slk_helpers has_no_flag_partial -v
for this purpose.
When you run the failed slk archive
command again, missing or incomplete files are archived properly and the partial file
flag ist removed. However, completely archived files are not touched and the partial file
flag will not be removed from these files.
Listing incompletely archived files#
Please run slk_helpers has_no_flag_partial
to quickly get a list of possibly incompletely archived files and run slk archive
again. If these flags persist, please notify us via . We will check the file again and request the StrongLink support to remove the flag.
Please submit a verify job with slk_helpers submit_verify_job
and later collect its results with slk_helpers result_verify_job
in order to get a list of actually incompletely archived files.
Background on what happens when slk archive fails#
When slk archive
starts to archive files, it, first, creates a 0-byte file in the destination location for each source file. Each of these 0-byte files is flagged as partial file
. The actual size of each source file is stored hidden in StrongLink. When a file has been completely archived, slk
and StrongLink need some time until the partial file
flag is removed. The time span between completing the archival and removal of the flag increases with increasing amount of transfered data and with increasing load on the connected StrongLink node. When slk
looses the connection to StrongLink before the flag has been removed, the flag remains being set.
If a call of slk archive
, which transfers many files, is killed abruptly, each destination file will have one of these three conditions:
# file is complete; partial file
is not set anymore
# file is complete; partial file
is still set
# file is incomplete; partial file
is still set
slk list
appends (partial file)
to each freshly archived file with this flag. However, when permissions, ownership, group, path or name of the flagged file were changed, the info (partial file)
is not printed by slk list
. Therefore, slk list
does not reliably print information on this flag. You can list all parital file
-flagged files in a namespace by slk_helpers has_no_flag_partial -R -v <namespace>
When slk archive
is run a second time, it skips all files, which are already in the destination and match in size and modification date. The partial file
flag is ignored when this comparison is done. All other files are archived (again) and their partial file
flags are removed. However, the partial file
flag is not removed from skipped files because their metadata is not touched at all. Please notify us via when you own such files even is you think that they are complete.
Therefore, the partial file
flag is a necessary but not a sufficient condition for a file being actually incomplete. In contrast, each incompletely archived file is flagged as partial file
Files, which are flagged as partial file
, are blocked for retrieval. A user has to no possiblity to remove a partial file
flag from a completely archived file. This has to be done by the StrongLink support. If you own such files, please contact us via and send us a list of these files. In advance, please make sure via slk archive -vv
that these files were actually completely archived. This can be also checked via a verify job.
example failed archival#
We want to archive some netCDF files from the current folder to /dkrz_test/techtalk/021
. This archival fails and some files are flagged as partial files.
$ slk archive *.nc /dkrz_test/techtalk/021
# some reason ...
Non-recursive Archive failed
$ slk list /dkrz_test/techtalk/021
... 1.1G ... (Partial File)
... 0 ... (Partial File)
... 1.1G ...
... 1.1G ...
... 1.1G ... (Partial File)
... 1.1G ... (Partial File)
... 0 ... (Partial File)
... 1.1G ... (Partial File)
... 144.4M ... (Partial File)
... 0 ... (Partial File)
Files: 10
If we now modify the permissions of one file, slk list
does not print the (Partial File)
info anymore.
$ slk chmod +r /dkrz_test/techtalk/021/
$ slk list /dkrz_test/techtalk/021
... 1.1G ... (Partial File)
... 0 ... (Partial File)
... 1.1G ...
... 1.1G ...
... 1.1G ... (Partial File)
... 1.1G ... (Partial File)
... 0 ... (Partial File)
... 1.1G ... (Partial File)
... 144.4M ...
... 0 ... (Partial File)
Files: 10
We can simply run the failed archival command again as shown further below. If this was not possible or failed again, please verify the files as shown next or directly contact us via .
In order to see which files are actually flagged as partial files
, we could run slk_helpers has_no_flag_partial -v
as described further below. In oder to find out which files are actually incomplete/partial, we could run a verify job (see below for details).
# run verify job
$ slk_helpers submit_verify_job /dkrz_test/techtalk/021 -R
Submitting up to 1 verify job(s) based on results of search id 732325:
search results: pages 1 to 1 of 1; visible search results: 10; submitted verify job: 247340
Number of submitted verify jobs: 1
# wait for job to be finished
$ slk_helpers job_status 247340
$ slk_helpers job_status 247340
# collect results when the job has completed
$ slk_helpers result_verify_job 247340
Resource content size does not match record: /dkrz_test/techtalk/021/
Resource content size does not match record: /dkrz_test/techtalk/021/
Resource content size does not match record: /dkrz_test/techtalk/021/
Resource content size does not match record: /dkrz_test/techtalk/021/
Resource content size does not match record: /dkrz_test/techtalk/021/
Erroneous files: 5
The five listed files are defect. The defects of files *
, *
, *
and *
are obvious. However, the file *
is also defect which cannot be directly seen in the output of slk list
We run slk archive
Non-recursive Archive completed
The file *
was also overwritten although based on the (human-readable) size it might have been ok.
Now, we can list the namespace’s content again:
$ slk list /dkrz_test/techtalk/021
... 1.1G ...
... 1.1G ...
... 1.1G ...
... 1.1G ... (Partial File)
... 1.1G ...
... 1.1G ...
... 1.1G ...
... 1.1G ... (Partial File)
... 1.1G ...
... 1.1G ...
Files: 10
The two files *
and *
, which were already completely archived during the first archival and skipped during the second, are still flagged as partial file
. The command slk_helpers has_no_flag_partial -v
will return the same. These flags cannot be removed by users. Retrieval of flagged files is not permitted. Please send us an email to to let the flags being removed.
Validate archivals#
In this section we describe methods to identify defect files – e.g. incompletely archived files (partial files
). Defect files can be archived again with the same slk archive
command with which they were archived in the first place. Please make sure that slk archive
finished correctly. Complete/intact files are automatically skipped. Methods to verify files are these:
check if a file is flagged as
partial file
run a verify job and collect its results (waiting time possible)
check if a file has already been written to tape
compare checksums of the source file and from StrongLink
When you archive important data we strongly recommend to run at least a verify job after archival. You can skip this if the files have already been written to tape because StrongLink performs a basic verification before writing files to tape. Thus, instead of running one or multiple verify jobs, you can simply wait until all files have been written to tape. However, this might take a few days. Additionally for very important data, the checksums should be compared in order to identify bit flips or issues of which we are not aware of.
Since January 2022, three incompletely archived files have been written to tape – of approximately 20 Mio written files. The reason for there incomplete files being written to tape has been identified and scans for such files are run on a regular basis. Therefore, we currently assume that a file is correct / complete if is on tape.
check if file flagged as “partial”#
The only defect of files, of which we are aware of and which were archived by users from Levante since January 2022, are/were incompletely archived files caused by aborted archivals. Incompletely archived files are flagged as partial file
. Hence, checking a file for this flag is a simple way to see whether a file might be incomplete. However, also completely archived files may be flagged as partial when slk archive
does not finish properly.
A partial file
is not necessarily incomplete but an incomplete file is definately flagged as partial file
Please use slk_helpers has_no_flag_partial -v
to check whether one file or multiple files are flagged as partial file
$ slk_helpers has_no_flag_partial /dkrz_test/netcdf/20230504c -R -v
/dkrz_test/netcdf/20230504c/ has partial flag
/dkrz_test/netcdf/20230504c/ has partial flag
/dkrz_test/netcdf/20230504c/ has partial flag
Number of files without partial flag: 7/10
Please do not trust the output of slk list
with respect to the existance of the parial file
flag because the flag might be hidden in some situations.
$ slk list /dkrz_test/netcdf/20230504c
-rwxr-xr-x- k204221 bm0146 553.9M 19 Jul 2021 02:18
-rw-r--r--- k204221 bm0146 553.9M 19 Jul 2021 02:18
-rw-r--r--- k204221 bm0146 553.9M 19 Jul 2021 02:18 (Partial File)
-rw-r--r--- k204221 bm0146 554.0M 19 Jul 2021 02:18 (Partial File)
Files: 4
The Partial File
is not displayed if the file was moved or renamed or if the permissions, group or owner of the file where changed. This is a known slk
Please notify us via when you own files flagged as partial file
– even though if you think they are OK. Please run slk archive -vv ...
beforehand. We will run an additional check and request the StrongLink support to remove these flags.
See also
Further examples on the usage of slk_helpers has_no_flag_partial
are on page slk usage examples.
run verify job#
Verify jobs can only target files in the cache. StrongLink automatically runs a verify job on each file it wants to write to tape. Files which fail this verification are not written to tape.
Verify jobs can also be manually started by user. They are submit via the command slk_helpers submit_verify_job
and run a few seconds to a few minutes. The results of the verify job – i.e. a list of incomplete files or files with other issues – are obtained via the command slk_helpers result_verify_job
. Verify jobs are submitted to the same queueing system as recall / retrieval jobs are submitted and might need to wait if the queue is full of jobs.
Please start a verify job as follows:
$ slk_helpers submit_verify_job /dkrz_test/netcdf/20230925a -R
Submitting up to 1 verify job(s) based on results of search id 576002:
search results: pages 1 to 1 of 1; visible search results: 10; submitted verify job: 176395
Number of submitted verify jobs: 1
A verify job with the id 176395
was submitted. It is in the same queue as recall jobs are. Thus, if many files are recalled and the StrongLink queue is well filled, verify jobs might need to wait some time until they are processed.
The command performs a search in the background which might take some time when StrongLink is under heavy load. You might run the command with -v
in order to see at which point the command is waiting for StrongLink.
$ slk_helpers submit_verify_job /dkrz_test/netcdf/20230925a -R -v
Generating search query.
Search query is: '{"$and":[{"path":{"$gte":"/dkrz_test/netcdf/20230925a"}},{"smart_pool":"slpstor"}]}'.
Starting search query.
Search ID is: 576005.
Search continuing. ......
Submitting up to 1 verify job(s) based on results of search id 576005:
Collecting search results from page 1 to page 1
Collecting search results 1 to 1000
Collected 10 search results from page 1 to page 1
Generate verify query
Submit verify query
search results: pages 1 to 1 of 1; visible search results: 10; submitted verify job: 176396
Number of submitted verify jobs: 1
When the verify job has been started please wait until it is finished. The job status is checked as follows:
$ slk_helpers job_status 176395
$ slk_helpers job_status 176395
$ slk_helpers job_status 176395
$ slk_helpers job_status 176395
# wait a few seconds or minutes ...
$ slk_helpers job_status 176395
The results of the verify job can be fetched via slk_helpers result_verify_job
$ slk_helpers result_verify_job 176395
Resource content size does not match record: /dkrz_test/netcdf/20230925a/
Resource content size does not match record: /dkrz_test/netcdf/20230925a/
Resource content size does not match record: /dkrz_test/netcdf/20230925a/
Resource content size does not match record: /dkrz_test/netcdf/20230925a/
Erroneous files: 4
Four size-mismatch errors were detected. The this case, these files should be re-archived or deleted from the archive. Resource content size does not match record is the default error when files were incompletely archived. Thus, it is the most common output. If the result_verify_job
command detects an unexpected error or an error which can be only solved by the DKRZ support, it will tell the user to notify the DKRZ staff.
$ slk_helpers result_verify_job 247338
Warning: Missing key in the JSON input: attributes.best_store; resource id: 80527401010; resource path: /dkrz_test/techtalk/020/
Warning: Resource has an unclear caching state; resource id: 80527401010; resource path: /dkrz_test/techtalk/020/
Resource content size does not match record: /dkrz_test/techtalk/020/
Resource content size does not match record: /dkrz_test/techtalk/020/
File not found: /dkrz_test/techtalk/020/
Erroneous files: 3 (some errors have to be solved by the DKRZ support; please contact
check if a file is on tape#
StrongLink performs basic file verification prior to writing a file to tape. Files which fail are not written to tape. You can use the command slk_helpers is_on_tape
to check whether a file or all files in a namespaces have already been written onto tape.
# check a single file which is one tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow/INDEX.txt
File is on tape
# check a directory of which all contained files are on tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow -R
All files are on tape
# check a directory of which not all contained files are on tape
$ slk_helpers is_on_tape /dkrz_test/techtalk/001 -R
Not all files are on tape.
If you wish to print how many files were checked or to generate a list of files which have not been written to tape yet, please run the command with -v
# check a single file which is one tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow/INDEX.txt -R -v
Number of files stored on tape: 1/1
# check a directory of which all contained files are on tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow -R -v
Number of files stored on tape: 23/23
# check a directory of which not all contained files are on tape
$ slk_helpers is_on_tape /dkrz_test/techtalk/001 -R -v
/dkrz_test/techtalk/001/file_01.txt is not on tape
/dkrz_test/techtalk/001/file_02.txt is not on tape
/dkrz_test/techtalk/001/file_00.txt is not on tape
Number of files stored on tape: 0/3
If you wish to print all checked files, please run the command with -vv
# check a single file which is one tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow/INDEX.txt -R -vv
/arch/bm0146/k204221/iow/INDEX.txt is on tape
Number of files stored on tape: 1/1
# check a directory of which all contained files are on tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow -R -v
/arch/bm0146/k204221/iow/iow_data_002.tar is on tape
/arch/bm0146/k204221/iow/iow_data_001.tar is on tape
/arch/bm0146/k204221/iow/iow_data2_003.tar is on tape
Number of files stored on tape: 23/23
# check a directory of which not all contained files are on tape
$ slk_helpers is_on_tape /dkrz_test/techtalk/001 -R -vv
/dkrz_test/techtalk/001/file_01.txt is not on tape
/dkrz_test/techtalk/001/file_02.txt is not on tape
/dkrz_test/techtalk/001/file_00.txt is not on tape
Number of files stored on tape: 0/3
compare checksums#
StrongLink calculates two types of checksums for files: sha512 and adler32. It might take a few hours after the archival until the checksums are calculated. If no checksum is available a day after the archival finished and the file size is larger than 0 byte, please contact
The checksums from StrongLink are obtained via slk_helpers checksum RESOURCE
. The sha512
checksum of a local file is calculated via sha512sum
# archive a file
$ slk archive /arch/bm0146/k204221/test_data
[========================================\] 100% complete. Files archived: 1/1, [1.7K/1.7K].
# wait some hours ...
# calculated the checksum of the local file
$ sha512sum
# get the checksum of the archived file
$ slk_helpers checksum -t sha512 /arch/bm0146/k204221/test_data/
Archival wrapper for SLURM#
In contrast to slk retrieve
, we do not provide SLURM wrapper scripts for slk archive
in the slk
module on Levante. Instead, you will find several SLURM script templates for archivals below.
Archival script templates#
Several script templates for different use cases are printed below and available for download:
archive multiple files
several archivals of single files:
archival of one file and checksum check:
When you use these templates, you need to make a few adaptions (not each script has all of them):
: replace/work/xz1234/ex/am/ple
by the actual source folder on the lustre file systemmodify
: replace/arch/xz1234/${USER}/ex/am/ple
by something appropriate for you projectmodify
by a correct
Please run/submit these scripts via sbatch
as described in Run slk as batch job and SLURM Introduction.
archive multiple files#
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=arch_files # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set the source folder
# set target folder for archival
# ~~~~~~~~~~~~ archivals ~~~~~~~~~~~~
# do the archival
echo "doing 'slk archive'"
# ~~~~~~~~~~~~ doing multi-file archival ~~~~~~~~~~~~
# You can archive multiple files at once -- either by listing them or by
# using wildcard expressions.
slk archive -vv /${src_folder}/ ${src_folder}/ ${src_folder}/*.tar ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call"
echo "archival of two files successful"
several archivals of single files#
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_arch_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set the source folder
# set target folder for archival
# ~~~~~~~~~~~~ archivals ~~~~~~~~~~~~
# do the archival
echo "doing 'slk archive'"
# ~~~~~~~~~~~~ doing single-file archivals ~~~~~~~~~~~~
# You can do multiple archivals in one script. The exit code of each
# archival should be captured afterwards (get $? in line after slk command)
slk archive ${src_folder}/ ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call 1"
echo "archival 1 successful"
# second archival and capture exit code (get $? in line after slk cmd)
slk archive ${src_folder}/ ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call 2"
echo "archival 2 successful"
# ...
# ...
# fifteenth archival and capture exit code (get $? in line after slk cmd)
slk archive ${src_folder}/ ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call 15"
echo "archival 15 successful"
archival of one file with delayed checksum check#
This template/example consists of two files:
archival (also starts the second script):
get and compare checksums:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_arch_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set the source folder
# set target folder for archival
# set a file to write the result of the checksum comparison into
# ~~~~~~~~~~~~ archivals ~~~~~~~~~~~~
# do the archival
echo "doing 'slk archive'"
# We run the archival and capture the exit code ...
slk archive ${src_folder}/${src_file} ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call"
exit 1
echo "archival successful"
# ... then we calculate the checksum and ...
checksum_src_file_raw=`sha512sum ${src_folder}/${src_file}`
if [ $? -ne 0 ]; then
>&2 echo "checksum could not be calculated"
exit 1
echo "calculation of checksum successful: ${checksum_src_file_raw}"
echo $checksum_src_file_raw > ${src_folder}/${src_file}.sha512
# ... submit a delayed job for retrieving the checksum from StrongLink
sbatch --begin="now+2hours" --account=${SLURM_JOB_ACCOUNT} ./ ${src_folder}/${src_file}.sha512 ${target_folder}/${src_file} ${checksum_result_file}
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_checksum # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ get and print arguments ~~~~~~~~~~~~
if [ "$#" -ne 3 ]; then
echo -1
exit 1
echo "~~~ got this input: ~~~"
echo "checksum_file: ${checksum_file}"
echo "resource_path_hsm: ${resource_path_hsm}"
echo "checksum_result_file: ${checksum_result_file}"
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# ~~~~~~~~~~~~ get source file's checksum ~~~~~~~~~~~~
if [ ! -f ${checksum_file} ]; then
>&2 echo "file containing the checksum of the source file does not exist: '${checksum_file}'"
exit 1
checksum_src_file_raw=`cat ${checksum_file}`
checksum_src_file=`echo ${checksum_src_file_raw} | awk '{ print $1 }'`
# ~~~~~~~~~~~~ check if HSM file is available ~~~~~~~~~~~~
# first we check whether the resource/file actually exists in the HSM
echo "doing 'slk_helpers exists'"
slk_helpers exists ${resource_path_hsm}
if [ $exit_code -ne 0 ]; then
if [ $exit_code -eq 1 ]; then
>&2 echo "file '${resource_path_hsm}'; stop obtaining a checksum"
exit 1
>&2 echo "an unknown error occurred in 'slk_helpers exists ${resource_path_hsm}' call; exit code: ${exit_code}"
exit 1
echo "file exists in HSM ('$resource_path_hsm')"
# ~~~~~~~~~~~~ get HSM checksum ~~~~~~~~~~~~
echo "doing 'slk_helpers checksum -t sha512'"
# We first run the archival and capture the exit code ...
checksum_hsm_file_raw=`slk_helpers checksum -t sha512 ${resource_path_hsm}`
if [ $exit_code -ne 0 ]; then
if [ $exit_code -eq 1 ]; then
echo "checksum of '${resource_path_hsm}' not yet calculated by StrongLink; resumitting this job"
sbatch --begin="now+2hours" --account=${SLURM_JOB_ACCOUNT} ${0} ${checksum_src_file} ${resource_path_hsm} ${checksum_result_file}
exit 0
>&2 echo "an error occurred in slk_helpers checksum call; exit code: ${exit_code}"
exit 1
echo "getting checksum successful"
checksum_hsm_file=`echo ${checksum_hsm_file_raw} | awk '{ print $1 }'`
# ~~~~~~~~~~~~ compare if checksums are equal ~~~~~~~~~~~~
echo "Result of checksum comparison will be written into ${checksum_result_file} (first line: 0 == checksums equal; 1 == checksums differ)"
if [ "${checksum_src_file}" = "${checksum_hsm_file}" ]; then
echo "checksums are equal: ${checksum_src_file}"
echo "checksums are unequal: ${checksum_src_file} and ${checksum_hsm_file}"
echo "${exit_code}" > ${checksum_result_file}
echo "# 0 == checksums equal; 1 == checksums differ)" >> ${checksum_result_file}
echo "checksum src file: ${checksum_src_file_raw}" >> ${checksum_result_file}
echo "checksum HSM file: ${checksum_hsm_file} ${resource_path_hsm}" >> ${checksum_result_file}
exit ${exit_code}