Archivals to tape#
file version: 17 Sept 2024
current software versions: slk version 3.3.91; slk_helpers version 1.12.10; slk wrappers 1.2.2
run slk archive#
slk archive
is available on all Levante nodes. Basic examples slk archive
calls are:
# archive one file, absolute path
$ slk archive /work/bm0146/k204221/some_file.nc /arch/ab01234/c567890/my_data_1/
# archive one file, relative path
$ slk archive some_file.nc /arch/ab01234/c567890/my_data_3/
# archive folder recursively, absolute path
$ slk archive -R /work/bm0146/k204221/some_folder /arch/ab01234/c567890/my_data_4/
# archive folder recursively, relative path, skip hidden files and folders
$ slk archive -x -R some_folder /arch/ab01234/c567890/my_data_5/
# archive multiple files
$ slk archive some_file_a.nc some_file_b.nc some_file_c.nc /arch/ab01234/c567890/my_data_6/
# archive multiple files using wildcards
$ slk archive file_?.nc /arch/ab01234/c567890/my_data_7/
$ slk archive year200[0123].nc /arch/ab01234/c567890/my_data_8/
slk archive
may need up to 6 GB
of memory (incl. overhead). If you want to archive a file of 1 GB
size, you are welcome to do this on the Levante login nodes. If you wnat to archive multiple GB
of data, please run slk archive
on the interactive
or shared
nodes and allocate 6 GB
of memory. When your slk
is killed with a message similar to the following one, too less memory was allocated.
/sw/[...]/bin/slk: line 16: [...] Killed
If you want to run multiple slk archive``s in parallel on one node, please allocate at least ``5 GB
per call of slk archive
. However, we recommend to run only one slk archive
– but with many files at once.
Interactive sessions are started with salloc
:
$ salloc --mem=6GB --partition=interactive --account=<YOUR_PROJECT_ACCOUNT>
$ slk archive example.nc /arch/ab0123/c456789
...
$ exit
See also Run slk in the “interactive” partition and Data Processing on Levante for details.
Different batch scripts for archiving data are provided at the bottom of this document. The script archive_slurm_template_multiple_files.sh
(section archive multiple files) is the most simple one and will cover most use cases.
# first, adapt source files and target
# then submit:
$ sbatch --account=<YOUR_PROJECT_ACCOUNT> ./archive_slurm_template_multiple_files.sh
slk archive
does not always finish successfully. Please check, whether it does. Three ways how to do this are described in section check if archival command was successful below.
If slk archive
fails, please run the same command again. Missing and incomplete existing files are archived. Complete existing files are skipped. If slk archive
fails repeatedly, we recommend running it with -vv
and checking the log file for error messages or Java exceptions.
Please consider running slk archive
with -x
to prevent the archival of hidden files and folders (details). Hidden folders such as .git
and .svn
might contain many small files and hidden files may contain login information or similar.
file size#
We recommend a size between 10 GB
and 200 GB
per archived file. We strongly recommend not to archive files below 1 GB
. Each archived file comsumes at least 1 GB
of tape quota. Please pack smaller files into tar balls prior to their archival. packems
supports you in this process (see below).
Why?
lower limit: It is very inefficient to retrieve a large number of small files from tape compared to a low number of large files of the same total size. This is because on average more tapes have to be read and more spooling has to be done per tape when more files are retrieved. The tape health is reduced and it takes more time to get all small files back.
upper limit: Due to the system setup of our current HSM system, the transfer speed between tape drives and HSM-cache decreases when files are larger than
200 GB
to250 GB
. This might change in future.
Retrieving 100 files of 100 MB
size takes much longer than retrieving one file of 10 GB
size. Additionally, the lifetime of the tape(s) is considerably reduced. When a file is read from tape, the tape drive spools the tape to the start of the file, stops, reads the file with increasing speed until top speed of 300 MB/s
is reached and stops towards the end of this file. In this case. this operation is repeated up to 100 times, which stresses the tape and takes time. If the files are stored on multiple tapes, the user has to wait additional time for the tape being transported from their shelves to the tape drives.
Warning
When you run slk archive -R
to archive a folder with model output of nice size, there might be hidden files and/or folders which you accidentally archive as well. These hiddens folders, such as .git
and .svn
, might contain many small files. Please set -x
to ignore these files (details).
Please do not archive more than 3 TB
with one call of slk archive
. There is a high probability that the connection to StrongLink is interrupted if you try to do so. The probability for connection interruptions increases with increasing size of the archival. There will be printed no error message to the terminal but only to the slk
log (~/.slk/slk-cli.log
):
2022-11-24 11:16:22 INFO Executing command: "archive -R /work/ab1234/c567890/much_data /arch/zy0987/c567890/target"
2022-11-24 11:18:25 ERROR Unexpected exception
java.io.IOException: unexpected end of stream on https://archive.dkrz.de/...
at
okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:202)
~[slk-cli-tools-3.3.21.jar:?]
[...]
[...]
[...]
... 16 more
2022-11-24 11:18:25 INFO
Archive report
===============
Status: incomplete
Total files uploaded: 0/85083 files [0B/20.3T]
Please search for unexpected end of stream on https://archive.dkrz.de/...
to find such events.
Please do not run more than two or three slk archive
in parallel because running more than that for a long time might cause a hight load on multiple StrongLink nodes. This would caused slow system response and higher probility for connection timeouts for all users.
If you need to archive more than 50 TB
at once, please contact us in advance via support@dkrz.de .
pack files#
Files below 1 GB size
should be packed into tar balls or zip files of a maximum size of 200 GB
. We recommend to use uncompressed tar balls.
You can create the tar balls manually or use packems
to do the job for you. A folder or a list of files are provided as input to packems
. packems
automatically fills the provided files into one or multiple tar balls of a maximum size of 100 GB
. In a next step, it copies the new tar balls into the tape archive. Additionally, packems
creates and archives a file INDEX.txt
, which contains a list of all packed files and their tar ball. packems
can do all tasks in one step but we recommend to run packing and archiving separately.
Please do not run packems
on the Levante login nodes but on shared
or interactive
nodes as described above for slk archive
. Because packems
uses slk archive
for archival, sufficient memory as to be allocated.
packems: basic archival#
# pack data with packems
# -d ... => local tmp destination of the tar balls
# -S ... => destination of the tar balls in the tape archive
# -o data_a => prefix of the tar ball names
# --no-archive => only pack and no archival yet
# /work/b.../a/0 => folder to pack recursively
$ packems \
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
--no-archive \
/work/bm0146/k204221/archive_this
...
# looks what is there
$ ls
data_a_001.tar data_a_002.tar data_a_003.tar ...
# archive the tar ball and create index file
# --archive-only => do only archive data
$ packems \
-d /scratch/k/k204221/packed \
-S /dkrz_test/techtalk/003 \
-o data_a \
--archive-only \
/work/bm0146/k204221/material_hsm_workshop/data/many_small_files/a/0
...
packems: change tar ball size#
Set the maximume size of the tar balls to 50 GB
.
# pack data with packems
# -t ... => target size (a few tar balls might slightly exceed the value)
# -m ... => hard maximum at least if the source files are large
$ packems \
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
-t 30 -m 50
--no-archive \
/work/bm0146/k204221/archive_this
...
packems: list of files as input#
We want to archive all *.nc
files from /work/bm0146/k204221/archive_this
. For this purpose, we run find ...
and pipe its output into packems
.
# pack data with packems
# -i - => read input from ``stdin``
$ find /work/bm0146/k204221/archive_this -type f -name *.nc | \
packems \
-i -
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
-t 30 -m 50
--no-archive \
...
We might also pipe the content of a file into packems
.
# pack data with packems
# -i - => read input from ``stdin``
$ cat file_list.txt | \
packems \
-i -
-d /scratch/k/k204221/packems \
-S /arch/bm0146/k204221/archived \
-o data_a \
-t 30 -m 50
--no-archive \
...
owner, group and permissions#
permissions#
The permissions of the original file are transfered to the archived file. The permissions can modified by the owner with this command
slk chmod <PERMISSIONS> /arch/ab1234/test.nc
owner#
The owner of the archived file is the archiving user. If files should be handed over to another user, please contact support@dkrz.de .
group#
The group of the archived file is the default group of the user. Thus, the group is not adapted to the target namespace. The owner of files can change their group with this command
slk group <GROUP> /arch/ab1234/test.nc
Skip and ignore files#
slk archive
automatically skips archiving a file when
a file with the same name already exist in the destination location,
both files are equal in size and
both files have the same modification time.
“modification time” means the mtime
timestamp of the file and not the archival or modification time in StrongLink.
Skipping a file is considered as success/not-failed. With this respect, slk archive
works the same as rsync
.
If the parameter -x
is set, slk archive
ignores all hidden files and directories. A file or directory is considered as hidden when its name starts with a dot. E.g. .git
, .ipynb_checkpoints
and .ssh
folders or .gitignore
and .config
files.
When you run slk archive
with -vv
, then skipped files are listed but ignored files are not listed.
check if archival command was successful#
There are three ways on how to check
look into the textual output of
slk archive
,capture the exit code
$?
(0
: successful archival)look into the slk log file:
~./slk/slk-cli.log
slk archive
may skip the archival of files. Skipping a file is considered as success/not-failed.
check success: evaluate text output#
slk archive
prints
Non-recursive Archive completed
if the archival of all files was successful andNon-recursive Archive failed
if the archival of at least one file failed.
Skipped files are not considered as failed.
In the terminal it will look like this:
# archive a new file
$ slk archive a_file.nc /arch/ab0123/c456789
Non-recursive Archive completed
# archive a file which already exists and will be skipped
$ slk archive already_existing_file.nc /arch/ab0123/c456789
Non-recursive Archive completed
# archival will fail because no write permissions in destination location
$ slk archive a_file.nc /arch/no_permissions
Non-recursive Archive failed
You can print the archival status of each file by appending the flag -vv
. In this case, you see whether a file was archived successfully (SUCCESS
) or was skipped (SKIPPED
).
# archive a new file
$ slk archive a_file.nc /arch/ab0123/c456789 -vv
a_file.nc SUCCESSFUL
Non-recursive Archive completed
# archive a file which already exists and will be skipped
$ slk archive already_existing_file.nc /arch/ab0123/c456789 -vv
already_existing_file.nc SKIPPED
Non-recursive Archive completed
# archival will fail because no write permissions in destination location
$ slk archive a_file.nc /arch/no_permissions -vv
a_file.nc FAILED
Non-recursive Archive failed
slk archive
prints no file status if the source file does not exist – although -vv
is set.
# archival will fail because source file does not exist
$ slk archive non_existing_file.nc /arch/ab0123/c456789 -vv
Non-recursive Archive failed
check success: capture exist code#
Each command returns an exit code when it ends. The exit code is not visible but the exit code of the most recent command is stored in the variable $?
. An exit code of 0
indicates success. Exit codes >0
indicate a failure or have a special meaning depending on the command.
slk archive
returns the exist code 0
on success and 1
on failure:
$ slk archive a_file.nc /arch/ab0123/c456789
Non-recursive Archive completed
$ echo $?
0
$ slk archive non_existing_file.nc /arch/ab0123/c456789
Non-recursive Archive failed
$ echo $?
1
All slk
commands only return 0
and 1
. In contrast, the slk_helpers
return 0
, 1
, 2
and 3
. The exact meaning differs from command to command. In the case of slk_helpers exists
, the 0
means yes
and 1
means no
:
$ slk_helpers exists /.../existing_file.nc > /dev/null 2>&1
$ echo $?
0
$ slk_helpers exists /.../non_existing_file.nc > /dev/null 2>&1
$ echo $?
1
$ slk_helpers exists /.../no_read_permissions.nc > /dev/null 2>&1
$ echo $?
2
Exit codes are very useful in bash scripts:
#!/bin/bash
slk archive a_file.nc /arch/ab0123/c456789
exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "archival successful; first try"
# do some more stuff ...
else
>&2 echo "error occurred during archival; wait and retry"
sleep 10
slk archive a_file.nc /arch/ab0123/c456789
exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "archival successful; second try"
# do some more stuff ...
else
>&2 echo "error occurred during archival; failed twice; exiting"
exit 1
fi
fi
check success: slk log file#
slk archive
only prints sparse information to the terminal. Most error messages and an archival report are printed to the slk
log file in ~/.slk/slk-cli.log
. Please give not other user read permissions to ~/.slk
because your slk
login token is also stored in that folder.
A successful archival looks like this in the log:
2024-06-12 11:10:53 levante.dkrz.de 197384 INFO Executing command: "archive ..."
2024-06-12 11:11:00 levante.dkrz.de 197384 INFO Non-recursive Archive completed
Archive report
===============
Status: success
Total files uploaded: 1/1 files [3B/3B]
A failed archival might look similar to this in the log:
2024-06-12 11:17:01 levante.dkrz.de 205127 INFO Executing command: "archive ..."
2024-06-12 11:17:02 levante.dkrz.de 205127 ERROR Unexpected exception
java.nio.file.NoSuchFileException: file_20.txt
[...]
Archive report
===============
Status: incomplete
Total files uploaded: 0/0 files [0B/0B]
Another failed archival might look like this in the log:
2023-03-25 01:34:25 xU22 97259 INFO Executing command: "archive ..."
2023-03-25 01:34:27 xU22 97259 ERROR No active nodes. Shutting down...
2023-03-25 01:34:27 xU22 97259 ERROR Failed to upload resource: [...]
GNS Path: [...]
Error: Code: 500, Reason: CONNECTION_ERROR, Message: Cannot connect to websocket, Detailed Message: Cannot connect to websocket
2023-03-25 01:34:29 xU22 97259 INFO Non-recursive Archive failed
Archive report
===============
Status: incomplete
Total files uploaded: 0/1 files [0B/1.5K]
Total files failed: 1/1 files [0B/1.5K]
Connection Error: 1
slk archive failed#
What to do when slk archive failed? If you are in a hurry you can try the quick solution. However, if you have a bit more time, it might be good to find out why slk archive
failed. Depending on situation, the quick solution might be appropriate (most situations) or not.
When slk archive
fails, one or more files will probably be flagged as partial file
. The important facts on such files are:
A file which is not flagged as
partial file
has been archived completely.A file which is flagged as
partial file
may be an incomplete or a complete file.slk list
does not reliably highlight files flagged aspartial file
slk_helpers has_no_flag_partial -v
reliably lists all files flagged aspartial file
a normal user cannot remove a
partial file
flag from a completely archived file
quick solution#
If the archival was interrupted, please run the same call of slk archive
a second time. The slk archive
will only transfer those files, which
have not already been archived,
have only been partly archived (internally flagged as
partial file
) orhave been modified since the first archival (see skip rules).
You can run slk archive
repeatedly until it success. Afterwards, please check for files flagged as partial
and notify us via support@dkrz.de to remove the (false) flags (details). Flagged files are blocked for retrieval.
Find out why archival failed#
You will find the most common reasons for slk archive
to fail in the table below. If you experiance other reasons for failure, please notify us so that we can extend this table.
reason for failure |
solution |
---|---|
manually killed by the user (e.g. via CTRL + C) |
re-run same |
broken ssh connection |
re-run same |
timeout of a SLURM job |
re-run same |
archival of a large amount of
data ( |
|
by the operating system (e.g. allowed memory exceeded). |
|
connection timeout to StrongLink |
|
no permissions to write into destination path |
obtain permissions |
source file(s) does not exist |
check source files or command |
partial file flag#
Files which are incompletely archived are flagged as partial file
. However, also completely archived files might be flagged as partial file
when slk archive
is interrupted directly after these files were completely archived.
slk list
appends (partial file)
in most situations when a file is flagged as partial file
. However, when permissions, ownership, group, path or name of the flagged file were changed, the info (partial file)
is not printed by slk list
. Therefore, please do not use slk list
to determine whether a file is flagged as partial file
or not. Please use slk_helpers has_no_flag_partial -v
for this purpose.
When you run the failed slk archive
command again, missing or incomplete files are archived properly and the partial file
flag ist removed. However, completely archived files are not touched and the partial file
flag will not be removed from these files.
Listing incompletely archived files#
Please run slk_helpers has_no_flag_partial
to quickly get a list of possibly incompletely archived files and run slk archive
again. If these flags persist, please notify us via support@dkrz.de to perfom further checks.
Background on what happens when slk archive fails#
When slk archive
starts to archive files, it, first, creates a 0-byte file in the destination location for each source file. Each of these 0-byte files is flagged as partial file
. The actual size of each source file is stored hidden in StrongLink. When a file has been completely archived, slk
and StrongLink need some time until the partial file
flag is removed. The time span between completing the archival and removal of the flag increases with increasing amount of transfered data and with increasing load on the connected StrongLink node. When slk
looses the connection to StrongLink before the flag has been removed, the flag remains being set.
If a call of slk archive
, which transfers many files, is killed abruptly, each destination file will have one of these three conditions:
# file is complete; partial file
is not set anymore
# file is complete; partial file
is still set
# file is incomplete; partial file
is still set
slk list
appends (partial file)
to each freshly archived file with this flag. However, when permissions, ownership, group, path or name of the flagged file were changed, the info (partial file)
is not printed by slk list
. Therefore, slk list
does not reliably print information on this flag. You can list all parital file
-flagged files in a namespace by slk_helpers has_no_flag_partial -R -v <namespace>
.
When slk archive
is run a second time, it skips all files, which are already in the destination and match in size and modification date. The partial file
flag is ignored when this comparison is done. All other files are archived (again) and their partial file
flags are removed. However, the partial file
flag is not removed from skipped files because their metadata is not touched at all. Please notify us via support@dkrz.de when you own such files even is you think that they are complete.
Therefore, the partial file
flag is a necessary but not a sufficient condition for a file being actually incomplete. In contrast, each incompletely archived file is flagged as partial file
.
Files, which are flagged as partial file
, are blocked for retrieval. A user has to no possiblity to remove a partial file
flag from a completely archived file. This has to be done by the StrongLink support. If you own such files, please contact us via support@dkrz.de and send us a list of these files. In advance, please make sure via slk archive -vv
that these files were actually completely archive.
example failed archival#
We want to archive some netCDF files from the current folder to /dkrz_test/techtalk/021
. This archival fails and some files are flagged as partial files.
$ slk archive *.nc /dkrz_test/techtalk/021
# some reason ...
Non-recursive Archive failed
$ slk list /dkrz_test/techtalk/021
... 1.1G ... file_001gb_a.nc (Partial File)
... 0 ... file_001gb_b.nc (Partial File)
... 1.1G ... file_001gb_c.nc
... 1.1G ... file_001gb_d.nc
... 1.1G ... file_001gb_e.nc (Partial File)
... 1.1G ... file_001gb_f.nc (Partial File)
... 0 ... file_001gb_g.nc (Partial File)
... 1.1G ... file_001gb_h.nc (Partial File)
... 144.4M ... file_001gb_i.nc (Partial File)
... 0 ... file_001gb_j.nc (Partial File)
Files: 10
If we now modify the permissions of one file, slk list
does not print the (Partial File)
info anymore.
$ slk chmod +r /dkrz_test/techtalk/021/file_001gb_i.nc
$ slk list /dkrz_test/techtalk/021
... 1.1G ... file_001gb_a.nc (Partial File)
... 0 ... file_001gb_b.nc (Partial File)
... 1.1G ... file_001gb_c.nc
... 1.1G ... file_001gb_d.nc
... 1.1G ... file_001gb_e.nc (Partial File)
... 1.1G ... file_001gb_f.nc (Partial File)
... 0 ... file_001gb_g.nc (Partial File)
... 1.1G ... file_001gb_h.nc (Partial File)
... 144.4M ... file_001gb_i.nc
... 0 ... file_001gb_j.nc (Partial File)
Files: 10
We can simply run the failed archival command again as shown further below. If this was not possible or failed again, please contact us via support@dkrz.de . In the past, it was recommended to run a verify job to check for defect files. However, verify jobs do not find all incompletely archived files.
We run slk archive
again:
$ slk archive *.nc /dkrz_test/techtalk/021 -vv
file_001gb_a.nc SKIPPED
file_001gb_b.nc SUCCESSFUL
file_001gb_c.nc SKIPPED
file_001gb_d.nc SKIPPED
file_001gb_e.nc SUCCESSFUL
file_001gb_f.nc SKIPPED
file_001gb_g.nc SUCCESSFUL
file_001gb_h.nc SKIPPED
file_001gb_i.nc SUCCESSFUL
file_001gb_j.nc SUCCESSFUL
Non-recursive Archive completed
The file *_e.nc
was also overwritten although based on the (human-readable) size it might have been ok.
Now, we can list the namespace’s content again:
$ slk list /dkrz_test/techtalk/021
... 1.1G ... file_001gb_a.nc
... 1.1G ... file_001gb_b.nc
... 1.1G ... file_001gb_c.nc
... 1.1G ... file_001gb_d.nc (Partial File)
... 1.1G ... file_001gb_e.nc
... 1.1G ... file_001gb_f.nc
... 1.1G ... file_001gb_g.nc
... 1.1G ... file_001gb_h.nc (Partial File)
... 1.1G ... file_001gb_i.nc
... 1.1G ... file_001gb_j.nc
Files: 10
The two files *_b.nc
and *_h.nc
, which were already completely archived during the first archival and skipped during the second, are still flagged as partial file
. The command slk_helpers has_no_flag_partial -v
will return the same. These flags cannot be removed by users. Retrieval of flagged files is not permitted. Please send us an email to support@dkrz.de to let the flags being removed.
Validate archivals#
In this section we describe methods to identify defect files – e.g. incompletely archived files (partial files
). Defect files can be archived again with the same slk archive
command with which they were archived in the first place. Please make sure that slk archive
finished correctly. Complete/intact files are automatically skipped. Methods to verify files are these:
check if a file is flagged as
partial file
check if a file has already been written to tape
compare checksums of the source file and from StrongLink
In the past, it was recommended to run a verify job to validate files in the cache. However, verify jobs do not find all incompletely archived files.
When you archive important data, you can wait until all files have been written to tape. StrongLink performs a basic verification before writing files to tape. Additionally for very important data, the checksums should be compared in order to identify bit flips or issues of which we are not aware of.
Note
We are not aware of any data which were archived since January 2022 and were written to tape but had any defects. Therefore, we currently assume that a file is correct / complete if is on tape.
check if file flagged as “partial”#
The only defect of files, of which we are aware of and which were archived by users from Levante since January 2022, are/were incompletely archived files caused by aborted archivals. Incompletely archived files are flagged as partial file
. Hence, checking a file for this flag is a simple way to see whether a file might be incomplete. However, also completely archived files may be flagged as partial when slk archive
does not finish properly.
Note
A partial file
is not necessarily incomplete but an incomplete file is definately flagged as partial file
.
Please use slk_helpers has_no_flag_partial -v
to check whether one file or multiple files are flagged as partial file
.
$ slk_helpers has_no_flag_partial /dkrz_test/netcdf/20230504c -R -v
/dkrz_test/netcdf/20230504c/file_500mb_d.nc has partial flag
/dkrz_test/netcdf/20230504c/file_500mb_f.nc has partial flag
/dkrz_test/netcdf/20230504c/file_500mb_g.nc has partial flag
Number of files without partial flag: 7/10
Please do not trust the output of slk list
with respect to the existance of the parial file
flag because the flag might be hidden in some situations.
$ slk list /dkrz_test/netcdf/20230504c
-rwxr-xr-x- k204221 bm0146 553.9M 19 Jul 2021 02:18 file_500mb_d.nc
-rw-r--r--- k204221 bm0146 553.9M 19 Jul 2021 02:18 file_500mb_e.nc
-rw-r--r--- k204221 bm0146 553.9M 19 Jul 2021 02:18 file_500mb_f.nc (Partial File)
-rw-r--r--- k204221 bm0146 554.0M 19 Jul 2021 02:18 file_500mb_g.nc (Partial File)
Files: 4
The Partial File
is not displayed if the file was moved or renamed or if the permissions, group or owner of the file where changed. This is a known slk
bug.
Please notify us via support@dkrz.de when you own files flagged as partial file
– even though if you think they are OK. Please run slk archive -vv ...
beforehand. We will run an additional check and request the StrongLink support to remove these flags.
See also
Further examples on the usage of slk_helpers has_no_flag_partial
are on page slk usage examples.
check if a file is on tape#
StrongLink performs basic file verification prior to writing a file to tape. Files which fail are not written to tape. You can use the command slk_helpers is_on_tape
to check whether a file or all files in a namespaces have already been written onto tape.
# check a single file which is one tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow/INDEX.txt
File is on tape
# check a directory of which all contained files are on tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow -R
All files are on tape
# check a directory of which not all contained files are on tape
$ slk_helpers is_on_tape /dkrz_test/techtalk/001 -R
Not all files are on tape.
If you wish to print how many files were checked or to generate a list of files which have not been written to tape yet, please run the command with -v
.
# check a single file which is one tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow/INDEX.txt -R -v
Number of files stored on tape: 1/1
# check a directory of which all contained files are on tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow -R -v
Number of files stored on tape: 23/23
# check a directory of which not all contained files are on tape
$ slk_helpers is_on_tape /dkrz_test/techtalk/001 -R -v
/dkrz_test/techtalk/001/file_01.txt is not on tape
/dkrz_test/techtalk/001/file_02.txt is not on tape
/dkrz_test/techtalk/001/file_00.txt is not on tape
Number of files stored on tape: 0/3
If you wish to print all checked files, please run the command with -vv
.
# check a single file which is one tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow/INDEX.txt -R -vv
/arch/bm0146/k204221/iow/INDEX.txt is on tape
Number of files stored on tape: 1/1
# check a directory of which all contained files are on tape
$ slk_helpers is_on_tape /arch/bm0146/k204221/iow -R -v
/arch/bm0146/k204221/iow/iow_data_002.tar is on tape
/arch/bm0146/k204221/iow/iow_data_001.tar is on tape
[...]
/arch/bm0146/k204221/iow/iow_data2_003.tar is on tape
Number of files stored on tape: 23/23
# check a directory of which not all contained files are on tape
$ slk_helpers is_on_tape /dkrz_test/techtalk/001 -R -vv
/dkrz_test/techtalk/001/file_01.txt is not on tape
/dkrz_test/techtalk/001/file_02.txt is not on tape
/dkrz_test/techtalk/001/file_00.txt is not on tape
Number of files stored on tape: 0/3
compare checksums#
StrongLink calculates two types of checksums for files: sha512 and adler32. It might take a few hours after the archival until the checksums are calculated. If no checksum is available a day after the archival finished and the file size is larger than 0 byte, please contact support@dkrz.de.
The checksums from StrongLink are obtained via slk_helpers checksum RESOURCE
. The sha512
checksum of a local file is calculated via sha512sum
.
# archive a file
$ slk archive test.nc /arch/bm0146/k204221/test_data
[========================================\] 100% complete. Files archived: 1/1, [1.7K/1.7K].
# wait some hours ...
# calculated the checksum of the local file
$ sha512sum test.nc
22ef50dcbd179775b5a6e632b02d8b99ddf16609f342a66c1fae818ed42a49d5a33af3dd8e059fa7a743f5b615620f2ad87a3d01bf3e2e0cde0e8a607bc1f15d test.nc
# get the checksum of the archived file
$ slk_helpers checksum -t sha512 /arch/bm0146/k204221/test_data/test.nc
22ef50dcbd179775b5a6e632b02d8b99ddf16609f342a66c1fae818ed42a49d5a33af3dd8e059fa7a743f5b615620f2ad87a3d01bf3e2e0cde0e8a607bc1f15d
Archival wrapper for SLURM#
In contrast to slk retrieve
, we do not provide SLURM wrapper scripts for slk archive
in the slk
module on Levante. Instead, you will find several SLURM script templates for archivals below.
Archival script templates#
Several script templates for different use cases are printed below and available for download:
archive multiple files
archive_slurm_template_multiple_files.sh
several archivals of single files:
archive_slurm_template_single_files.sh
archival of one file and checksum check:
archive_slurm_template_single_file_with_checksum_check.sh
andarchive_slurm_template_get_and_compare_checksum.sh
When you use these templates, you need to make a few adaptions (not each script has all of them):
modify
src_folder
: replace/work/xz1234/ex/am/ple
by the actual source folder on the lustre file systemmodify
target_folder
: replace/arch/xz1234/${USER}/ex/am/ple
by something appropriate for you projectmodify
src_file
: replacefile.nc
by a correct
Please run/submit these scripts via sbatch
as described in Run slk as batch job and SLURM Introduction.
archive multiple files#
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=arch_files # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set the source folder
src_folder=/work/xz1234/elp/ma/xe
# set target folder for archival
target_folder=/arch/xz1234/${USER}/ex/am/ple
# ~~~~~~~~~~~~ archivals ~~~~~~~~~~~~
# do the archival
echo "doing 'slk archive'"
# ~~~~~~~~~~~~ doing multi-file archival ~~~~~~~~~~~~
# You can archive multiple files at once -- either by listing them or by
# using wildcard expressions.
slk archive -vv /${src_folder}/file01.nc ${src_folder}/file02.nc ${src_folder}/*.tar ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call"
else
echo "archival of two files successful"
fi
several archivals of single files#
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_arch_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set the source folder
src_folder=/work/xz1234/elp/ma/xe
# set target folder for archival
target_folder=/arch/xz1234/${USER}/ex/am/ple
# ~~~~~~~~~~~~ archivals ~~~~~~~~~~~~
# do the archival
echo "doing 'slk archive'"
# ~~~~~~~~~~~~ doing single-file archivals ~~~~~~~~~~~~
# You can do multiple archivals in one script. The exit code of each
# archival should be captured afterwards (get $? in line after slk command)
slk archive ${src_folder}/file01.nc ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call 1"
else
echo "archival 1 successful"
fi
# second archival and capture exit code (get $? in line after slk cmd)
slk archive ${src_folder}/file02.nc ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call 2"
else
echo "archival 2 successful"
fi
# ...
# ...
# fifteenth archival and capture exit code (get $? in line after slk cmd)
slk archive ${src_folder}/file15.nc ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call 15"
else
echo "archival 15 successful"
fi
archival of one file with delayed checksum check#
This template/example consists of two files:
archival (also starts the second script):
archive_slurm_template_single_file_with_checksum_check.sh
get and compare checksums:
archive_slurm_template_get_and_compare_checksum.sh
archive_slurm_template_single_file_with_checksum_check.sh#
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_arch_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set the source folder
src_folder=/work/xz1234/elp/ma/xe
src_file=file.nc
# set target folder for archival
target_folder=/arch/xz1234/${USER}/ex/am/ple
# set a file to write the result of the checksum comparison into
checksum_result_file=${src_folder}/${src_file}.chk
# ~~~~~~~~~~~~ archivals ~~~~~~~~~~~~
# do the archival
echo "doing 'slk archive'"
# We run the archival and capture the exit code ...
slk archive ${src_folder}/${src_file} ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk archive call"
exit 1
else
echo "archival successful"
fi
# ... then we calculate the checksum and ...
checksum_src_file_raw=`sha512sum ${src_folder}/${src_file}`
if [ $? -ne 0 ]; then
>&2 echo "checksum could not be calculated"
exit 1
else
echo "calculation of checksum successful: ${checksum_src_file_raw}"
fi
echo $checksum_src_file_raw > ${src_folder}/${src_file}.sha512
# ... submit a delayed job for retrieving the checksum from StrongLink
sbatch --begin="now+2hours" --account=${SLURM_JOB_ACCOUNT} ./archive_slurm_template_get_and_compare_checksum.sh ${src_folder}/${src_file}.sha512 ${target_folder}/${src_file} ${checksum_result_file}
archive_slurm_template_get_and_compare_checksum.sh#
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_checksum # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ get and print arguments ~~~~~~~~~~~~
if [ "$#" -ne 3 ]; then
echo -1
>&2 echo "need three input argument (got $#): FILE_CONTAINING_CHECKSUM_OF_SRC_FILE RESOURCE_PATH_HSM CHECKSUM_COMPARISON_RESULT_FILE"
exit 1
fi
checksum_file=$1
resource_path_hsm=$2
checksum_result_file=$3
echo "~~~ got this input: ~~~"
echo "checksum_file: ${checksum_file}"
echo "resource_path_hsm: ${resource_path_hsm}"
echo "checksum_result_file: ${checksum_result_file}"
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# ~~~~~~~~~~~~ get source file's checksum ~~~~~~~~~~~~
if [ ! -f ${checksum_file} ]; then
>&2 echo "file containing the checksum of the source file does not exist: '${checksum_file}'"
exit 1
fi
checksum_src_file_raw=`cat ${checksum_file}`
checksum_src_file=`echo ${checksum_src_file_raw} | awk '{ print $1 }'`
# ~~~~~~~~~~~~ check if HSM file is available ~~~~~~~~~~~~
# first we check whether the resource/file actually exists in the HSM
echo "doing 'slk_helpers exists'"
slk_helpers exists ${resource_path_hsm}
exit_code=$?
if [ $exit_code -ne 0 ]; then
if [ $exit_code -eq 1 ]; then
>&2 echo "file '${resource_path_hsm}'; stop obtaining a checksum"
exit 1
else
>&2 echo "an unknown error occurred in 'slk_helpers exists ${resource_path_hsm}' call; exit code: ${exit_code}"
exit 1
fi
else
echo "file exists in HSM ('$resource_path_hsm')"
fi
# ~~~~~~~~~~~~ get HSM checksum ~~~~~~~~~~~~
echo "doing 'slk_helpers checksum -t sha512'"
# We first run the archival and capture the exit code ...
checksum_hsm_file_raw=`slk_helpers checksum -t sha512 ${resource_path_hsm}`
exit_code=$?
if [ $exit_code -ne 0 ]; then
if [ $exit_code -eq 1 ]; then
echo "checksum of '${resource_path_hsm}' not yet calculated by StrongLink; resumitting this job"
sbatch --begin="now+2hours" --account=${SLURM_JOB_ACCOUNT} ${0} ${checksum_src_file} ${resource_path_hsm} ${checksum_result_file}
exit 0
else
>&2 echo "an error occurred in slk_helpers checksum call; exit code: ${exit_code}"
exit 1
fi
else
echo "getting checksum successful"
fi
checksum_hsm_file=`echo ${checksum_hsm_file_raw} | awk '{ print $1 }'`
# ~~~~~~~~~~~~ compare if checksums are equal ~~~~~~~~~~~~
echo "Result of checksum comparison will be written into ${checksum_result_file} (first line: 0 == checksums equal; 1 == checksums differ)"
if [ "${checksum_src_file}" = "${checksum_hsm_file}" ]; then
echo "checksums are equal: ${checksum_src_file}"
exit_code=0
else
echo "checksums are unequal: ${checksum_src_file} and ${checksum_hsm_file}"
exit_code=1
fi
echo "${exit_code}" > ${checksum_result_file}
echo "# 0 == checksums equal; 1 == checksums differ)" >> ${checksum_result_file}
echo "checksum src file: ${checksum_src_file_raw}" >> ${checksum_result_file}
echo "checksum HSM file: ${checksum_hsm_file} ${resource_path_hsm}" >> ${checksum_result_file}
exit ${exit_code}