Retrievals from tape#
file version: 05 Feb 2024
current software versions: slk version 3.3.91; slk_helpers version 1.11.2; slk wrappers 1.2.2
Introduction and Summary#
On Levante login nodes, the slk retrieve
command can only retrieve one file at once. Please use slk retrieve
via the compute
, shared
or interactive
partitions. Please always allocate 6 GB of memory (--mem=6GB
, Recommendations for usage of slk retrieve). If your slk
is killed with a message like /sw/[...]/bin/slk: line 16: [...] Killed
, then please inform the DKRZ support (support@dkrz.de) and allocate 8 GB
or 10 GB
of memory. If you wish to use slk retrieve
interactively, please start an interactive batch session via the interactive
partition as follows (see also Run slk in the “interactive” partition; details on salloc
: Data Processing on Levante):
salloc --mem=6GB --partition=interactive --account=YOUR_PROJECT_ACCOUNT
Note
Please note the section Aggregate file retrievals in order to speed up your retrievals.
Warning
If you retrieve more than 10 files are once, please run slk_helpers gfbt PATH -R --count-tapes
or slk_helpers gfbt --search-id SEARCH_ID --count-tapes
first. If the files are located on more than five tapes, please split the retrieval into multiple parts as described below (see also short command manual and usage examples). Currently, the whole StrongLink system slows down considerably, when single retrievals / recalls access too many tapes at once. This is an issue which is highly prioritized and is expected to be solved in future. Therefore, splitting retrievals as described above is very important to keep the StrongLink system running fast.
Recommendations for usage of slk retrieve#
High memory usage: Please allocate 6 GB of memory for each call of
slk retrieve
(argument forsbatch
andsalloc
:--mem=6GB
). Otherwise, your commands might be killed by the operating system. If you plan to run three retrievals in parallel, please allocate 18 GB – and so on.On Levante, recommended: Please striped the target folder of retrievals as follows
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M TARGET_FOLDER
.Check exit codes: Information on the success or failure of
slk retrieve
will not be printed into the SLURM log automatically. We strongly suggest to check the exit code of eachslk retrieve
and print it to the job log. The variable$?
holds the exit code of the preceding command (see How do I capture exit codes?; example: our example scripts for usage examples).Please also be aware that some issues on Levante might cause
slk
to crash randomly (see section slk issues on Levante on page Known Issues).
Resume interruped retrievals#
When slk retrieve
is started it creates a 0-byte file with a temporary name for each file that has to be retrieved. The temporary name has the format ~[FILENAME][RANDOM_NUMBER].retrieve
. The temporary files are filled one by one. Each file which is fully retrieved is renamed to its final name.
If a slk retrieve
call is interruped, please react differently depending on whether the requested files are in the HSM cache or only on tape.
Files are only on tape#
If slk retrieve
has to get files from tape then it starts a recall job in StrongLink. Running slk recall
does the same. If slk retrieve
/ slk recall
is killed (e.g. CTRL + C or ssh timeout) then the StrongLink internal recall job is not aborted. Instead, the job will run until it is completed or manually aborted by a StrongLink admin. Please avoid starting the same retrieval / recall from tape repeatedly if the original recall job is still queued or running. You can check this via slk_helpers job_status JOB_ID
as described in Check job and queue status. When the job is completed, all files should be in the HSM cache and you can do a quick retrieval from there. Please contact the DKRZ support (support@dkrz.de) if the files are not in the HSM cache after the second try.
Note
If you need to retrieve files in the evening, run slk recall
in the morning, write down the job id and check the job status from time to time. If the system is very busy (check via slk_helpers job_queue
), you might also run slk recall
the day before. If the HSM cache is extremly full, files might be removed from it every night. But commonly, they remain there as few days.
Files are in the HSM cache#
Short answer: Please run slk retrieve
with the option -s
to skip files which have been previously retrieved.
Details: If slk retrieve /source /target
is run more than once, than all files from /source
are retrieved again and already existing files are overwritten. If existing files should be skipped then please run slk retrieve
with -s
(for skip). If you want to retrieve all files again but wish to keep the existing files then please run slk retrieve
with -d
(for duplicate).
Speed up your retrievals#
When you retrieve data from the HSM, these data are first copied from a tape to the HSM cache and, in a second step, copied from the HSM cache to your _local_ file system (Lustre). If the data are already located in the cache, the first step is automatically omitted. Commonly, copying a file from tape to cache takes longer than copying it from cache to the Lustre file system. Therefore, it is good to know where the needed files are currently stored in order to estimate the time needed for retrieval. Below some hints how to improve the speed of your file retrievals.
Is a file stored in the HSM cache and/or on tape?#
The output of slk list
indicates whether a file is stored in the HSM cache or not. If the 11th character of the permissions-string is a t
then the file is stored exclusively on tape. If it is a -
then the file is stored in the cache. In the latter case, the user does not know whether the file is additionally stored on tape or not, for example if the file was archived shortly before slk list
was performed and had not yet been transferred to tape. Example:
$ slk list /arch/ex/am/ple
-rw-r--r--- k204221 bm0146 11 B 02 Mar 2021 file_1.txt
-rw-r--r--t k204221 bm0146 16 B 02 Mar 2021 file_2.txt
-rw-r--r--t k204221 bm0146 15 B 02 Mar 2021 file_3.txt
Example explained: The file file_1.txt
is stored in the cache and can be quickly retrieved. The files file_2.txt
and file_3.txt
are only stored on tape and their retrieval will take more time.
Additionally, the slk_helpers
feature a command iscached
, which prints out information on the location of storage. Please note that the exit code of this command is 1
if the tested file is not cached (see How do I capture exit codes?). Example:
$ slk_helpers iscached /arch/ex/am/ple/file_2.txt
File is not cached
$ echo $?
1
$ slk_helpers iscached /arch/ex/am/ple/file_1.txt
File is cached
$ echo $?
0
Aggregate file retrievals#
Warning
If you retrieve more than 10 files are once, please run slk_helpers gfbt PATH -R --count-tapes
or slk_helpers gfbt --search-id SEARCH_ID --count-tapes
first. If the files are located on more than five tapes, please split the retrieval into multiple parts as described below (see also short command manual and usage examples). Currently, the whole StrongLink system slows down considerably, when single retrievals / recalls access too many tapes at once. This is an issue which is highly prioritized and is expected to be solved in future. Therefore, splitting retrievals as described above is very important to keep the StrongLink system running fast.
When several files shall be retrieved, it is most efficient to retrieve all files at once with one call of slk retrieve
instead of retrieving each file with an individual call of slk retrieve
. First, all files that are stored on one tape will be read from that tape at once. Individual slk retrieve
calls would cause the tape to be ejected and brought back to its shelf after each retrieval. Second, one slk retrieve
command might use several tape drives to copy files from distinct tapes in parallel. Using one slk retrieve
call per file does not allow using this feature. Therefore, it is useful to aggregate several file retrievals into one.
The best performance and error control is reached if all needed data which is stored on one tape is retrieved/recalled with one call of slk retrieve
/ slk recall
. Each slk retrieve
should only target one tape. This can be done via the command slk_helpers group_files_by_tape
which is new since the slk_helpers release 1.6.0 (Dec. 2022). group_files_by_tape
might be used as described below.
Recursive retrievals#
When an entire namespace or most files of one namespace shall be retrieved, this namespace should be retrieved via one recursive slk retrieve
call. Example
$ slk list /ex/am/ple/data | cat
-rw-r--r--- k204221 bm0146 1.2M 10 Jun 2020 INDEX.txt
-rw-r--r--t k204221 bm0146 19.5G 05 Jun 2020 out_data_001.tar
-rw-r--r--t k204221 bm0146 19.0G 05 Jun 2020 out_data_002.tar
-rw-r--r--t k204221 bm0146 19.4G 05 Jun 2020 out_data_003.tar
-rw-r--r--t k204221 bm0146 19.3G 05 Jun 2020 out_data_004.tar
-rw-r--r--t k204221 bm0146 19.1G 05 Jun 2020 out_data_005.tar
-rw-r--r--t k204221 bm0146 7.8G 05 Jun 2020 out_data_006.tar
Files: 7
$ slk retrieve -R /ex/am/ple/data /tar/get/folder
[ ] 100% complete. Files retrieved: 7/7, [105.3G/105.3G].
$ ls /tar/get/folder
INDEX.txt out_data_001.tar out_data_002.tar out_data_003.tar out_data_004.tar out_data_005.tar out_data_006.tar
Run a search query and retrieve search results#
If only some but not all files of one namespace or a set of files distributed over different namespaces shall be retrieved, it is reasonable to define a search query to find these files and retrieve them via their search id. Regular expression can be used to match multiple filenames but it cannot be used to match the path. Bash wildcards/globs do not work. In the example below, the files out_data_002.tar
, out_data_005.tar
and out_data_006.tar
shall be retrieved:
$ slk list /ex/am/ple/data | cat
-rw-r--r--- k204221 bm0146 1.2M 10 Jun 2020 INDEX.txt
-rw-r--r--t k204221 bm0146 19.5G 05 Jun 2020 out_data_001.tar
-rw-r--r--t k204221 bm0146 19.0G 05 Jun 2020 out_data_002.tar
-rw-r--r--t k204221 bm0146 19.4G 05 Jun 2020 out_data_003.tar
-rw-r--r--t k204221 bm0146 19.3G 05 Jun 2020 out_data_004.tar
-rw-r--r--t k204221 bm0146 19.1G 05 Jun 2020 out_data_005.tar
-rw-r--r--t k204221 bm0146 7.8G 05 Jun 2020 out_data_006.tar
Files: 7
$ slk search '{"$and": [{"path": {"$gte": "/ex/am/ple/data"}}, {"resources.name": {"$regex": "out_data_00[256].tar"}}]}'
Search continuing. ..
Search ID: 65621
$ slk list 65621 | cat
-rw-r--r--t k204221 bm0146 19.0G 05 Jun 2020 out_data_002.tar
-rw-r--r--t k204221 bm0146 19.1G 05 Jun 2020 out_data_005.tar
-rw-r--r--t k204221 bm0146 7.8G 05 Jun 2020 out_data_006.tar
Files: 7
$ slk retrieve 65621 /tar/get/folder
[ ] 100% complete. Files retrieved: 3/3, [45.9G/45.9G].
$ ls /tar/get/folder
out_data_002.tar out_data_005.tar out_data_006.tar
In this example, the namespace /ex/am/ple/data
and all of its sub-namespaces would be searched recursively. If the user wants the search to be performed non-recursively only in /ex/am/ple/data
, then the operator $max_depth
can be used.
$ slk search '{"$and": [{"path": {"$gte": "/ex/am/ple/data", "$max_depth": 1}}, {"resources.name": {"$regex": "out_data_00[256].tar"}}]}'
Search continuing. ..
Search ID: 65622
Instead of formulating the search query string yourself, you can use the command slk_helpers gen_file_query
to generate such a query string.
# non-recursive search query string
$ slk_helpers gen_file_query /ex/am/ple/data/out_data_00[256].tar
'{"$and": [{"path": {"$gte": "/ex/am/ple/data", "$max_depth": 1}}, {"resources.name": {"$regex": "out_data_00[256].tar"}}]}'
# recursive search query string
$ slk_helpers gen_file_query -R /ex/am/ple/data/out_data_00[256].tar
'{"$and": [{"path": {"$gte": "/ex/am/ple/data"}}, {"resources.name": {"$regex": "out_data_00[256].tar"}}]}'
gen_file_query
can only generate search query strings to search for resources (files/namespaces) by their name (field: resource.name
) and/or location (path
).
See also
More gen_file_query
example applications are given here.
Note
Since slk 3.3.67, slk retrieve SEARCH_ID TARGET_FOLDER
will recreate the original folder hierarchie of each retrieved file in TARGET_FOLDER
. If a file was located in /arch/ab1234/c567890
it will be retrieved to TARGET_FOLDER/arch/ab1234/c567890
. Prior to slk 3.3.67 (e.g. slk 3.3.21) all files retrieved via one SEARCH_ID
were written into the same folder TARGET_FOLDER
. This caused problems if several files with same name but different source locations were retrieved.
Use group_files_by_tape – basics#
The command group_files_by_tape
receives one or more files and checks onto which tapes the files are written on and whether they are currently in the HSM cache. Depending on the provided parameters, it just prints the number of tapes, a list of files per tape or even runs a search query per tape to be used by other commands like slk retrieve
. Since group_files_by_tape
is relatively cumbersome to type, there is the short form gfbt
available. gfbt
accepts files, file lists, search ids and much more as input as listed a bit below.
If you want to count the number of tapes on which your files are stored, run gfbt
with --count-tapes
. Files in the cache are ignored.
$ slk_helpers gfbt /arch/bm0146/k204221/iow -R --count-tapes
10 tapes with single-tape files
0 tapes with multi-tape files
Some files are split into two parts which are stored on multiple tapes. gfbt
treats these files differently from files which are stored as one part on one tape.
If you want to get an overview over the number of files stored per tape and/or over the tape stati, run gfbt
with --details --count-files
:
$ slk_helpers gfbt /arch/bm0146/k204221/iow/ -R --details --count-files
cached (AVAILABLE ): 1
C25543L6 (AVAILABLE ): 1
C25566L6 (AVAILABLE ): 2
M12208M8 (AVAILABLE ): 3
M20471M8 (AVAILABLE ): 1
M12211M8 (AVAILABLE ): 4
C25570L6 (AVAILABLE ): 1
M12215M8 (AVAILABLE ): 5
C25539L6 (ERROSTATE ): 2
B09208L5 (BLOCKED ): 1
M12217M8 (AVAILABLE ): 2
Alphanumeric string in the first column is the tape barcode, which you can ignore in most cases. cached
(first row) contains all files which are currently in the cache. The string in the brackets is the tape status (see also Tape Stati):
AVAILABLE
=> tape available for retieval/recallBLOCKED
=> tape is blocked by a write job; please try laterERRORSTATE
=> tape is in a bad state; please contact support@dkrz.de and the tape will be reset
If you want to run a search per tape, run gfbt
with --full
:
$ slk_helpers gfbt /arch/bm0146/k204221/iow/ -R --full
cached (AVAILABLE ): 417725
C25543L6 (AVAILABLE ): 417715
C25566L6 (AVAILABLE ): 417716
M12208M8 (AVAILABLE ): 417717
M20471M8 (AVAILABLE ): 417718
M12211M8 (AVAILABLE ): 417719
C25570L6 (AVAILABLE ): 417720
M12215M8 (AVAILABLE ): 417721
C25539L6 (AVAILABLE ): 417722
B09208L5 (AVAILABLE ): 417723
M12217M8 (AVAILABLE ): 417724
The numbers in the most right column are the search ids of the performed searches. You can list the search results via slk list
and use them with slk retrieve
, slk recall
and our respective wrapper scripts.
You can also perform one search for two tapes (but not more) as follows:
$ slk_helpers gfbt /arch/bm0146/k204221/iow/ -R --full --set-max-tape-number-per-search 2
cached (AVAILABLE ): 421699
C25543L6 (AVAILABLE ): 421700
C25566L6 (AVAILABLE ): 421700
M12208M8 (AVAILABLE ): 421701
M12211M8 (AVAILABLE ): 421701
C25570L6 (AVAILABLE ): 421702
M12215M8 (AVAILABLE ): 421702
C25539L6 (AVAILABLE ): 421703
B09208L5 (AVAILABLE ): 421703
M12217M8 (AVAILABLE ): 421704
gfbt
accepts different things as input as follows:
# one file as input
$ slk_helpers gfbt /arch/bm0146/k204221/iow/iow_data2_001.tar
# file list as input
$ slk_helpers gfbt /arch/bm0146/k204221/iow/iow_data2_001.tar /arch/bm0146/k204221/iow/iow_data2_002.tar
# directory/namespace as input (position of `-R` not relevant`); also multiple
$ slk_helpers gfbt -R /arch/bm0146/k204221/iow
$ slk_helpers gfbt /arch/bm0146/k204221/iow -R
$ slk_helpers gfbt -R /arch/bm0146/k204221/iow /arch/bm0146/k204221/iow2
# path with a regular expression in the filename
$ slk_helpers gfbt /arch/bm0146/k204221/iow/iow_data2_00[0-9].tar
# search id (only one search id; not more than one)
$ slk_helpers gfbt --search-id 123456
# search query (only one search query; not more than one)
$ slk_helpers gfbt --search-query '{"path": {"$gte": "/arch/bm0146/k204221/iow"}}'
Note
The command slk_helpers gfbt /arch/bm0146/k204221/iow/iow_data2_001.tar -R
will look recursively for iow_data2_001.tar
in /arch/bm0146/k204221/iow
.
Use group_files_by_tape – advanced#
A quick description of basic features of gfbt
is given above. It looks up which files are stored in the HSM cache and which are not stored in the HSM cache but only on tape. Files on tape are grouped by tape: each line of the output contains all files which are on one tape. Some files are split into multiple parts, which are stored on different tapes. These files are listed in an extra row multi-tape
. Finally, theoretically there might be files without any storage information. These files would be listed in another row starting with not stored
.
--smtnps
/ --set-max-tape-number-per-search
The user can directly create a search query for retrieving all files from one tape (--gen-search-query
) or directly run this search (--run-search-query
). By the parameter --set-max-tape-number-per-search <N>
the number of tapes, over which a search query is generated or over which a search is performed, can be increased. If a search is performed, a search id is printed per tape. If the user wants to know the tape barcode and the tape status, she/he might use --print-tape-barcode
and --tape-status
, respectively. There are some parameters (--details
, --full
), which imply several other parameters.
We have this example data:
$ slk list /arch/bm0146/k204221/iow
-rw-r--r--- k204221 bm0146 1.2M 10 Jun 2020 08:25 INDEX.txt
-rw-r--r--t k204221 ka1209 19.5G 05 Jun 2020 17:36 iow_data2_001.tar
-rw-r--r--t k204221 bm0146 19.0G 05 Jun 2020 17:38 iow_data2_002.tar
-rw-r--r--t k204221 bm0146 19.4G 05 Jun 2020 17:38 iow_data2_003.tar
-rw-r--r--t k204221 bm0146 19.3G 05 Jun 2020 17:40 iow_data2_004.tar
-rw-r--r--t k204221 bm0146 19.1G 05 Jun 2020 17:40 iow_data2_005.tar
-rw-r--r--t k204221 bm0146 7.8G 05 Jun 2020 17:41 iow_data2_006.tar
-rw-r--r--t k204221 bm0146 186.9G 05 Jun 2020 19:37 iow_data3_001.tar
-rw-r--r--t k204221 bm0146 24.6G 05 Jun 2020 19:14 iow_data3_002.tar
-rw-r--r--t k204221 bm0146 4.0M 05 Jun 2020 19:43 iow_data4_001.tar
-rw-r--r--t k204221 bm0146 10.5G 05 Jun 2020 19:46 iow_data4_002.tar
-rw-r--r--t k204221 bm0146 19.5G 10 Jun 2020 08:21 iow_data5_001.tar
-rw-r--r--t k204221 bm0146 19.0G 10 Jun 2020 08:23 iow_data5_002.tar
-rw-r--r--t k204221 bm0146 19.4G 10 Jun 2020 08:23 iow_data5_003.tar
-rw-r--r--t k204221 bm0146 19.3G 10 Jun 2020 08:24 iow_data5_004.tar
-rw-r--r--t k204221 bm0146 19.1G 10 Jun 2020 08:25 iow_data5_005.tar
-rw-r--r--t k204221 bm0146 7.8G 10 Jun 2020 08:25 iow_data5_006.tar
-rw-r--r--t k204221 bm0146 19.5G 05 Jun 2020 17:53 iow_data_001.tar
-rw-r--r--t k204221 bm0146 19.0G 05 Jun 2020 17:53 iow_data_002.tar
-rw-r--r--t k204221 bm0146 19.4G 05 Jun 2020 17:56 iow_data_003.tar
-rw-r--r--t k204221 bm0146 19.3G 05 Jun 2020 17:56 iow_data_004.tar
-rw-r--r--t k204221 bm0146 19.1G 05 Jun 2020 17:58 iow_data_005.tar
-rw-r-----t k204221 bm0146 7.8G 05 Jun 2020 17:57 iow_data_006.tar
Files: 23
If you want to count the number of tapes on which your files are stored, run gfbt
with --count-tapes
. Files in the cache are ignored.
$ slk_helpers gfbt /arch/bm0146/k204221/iow -R --count-tapes
10 tapes with single-tape files
0 tapes with multi-tape files
We do a basic grouping of all files. All files listed in one row starting with tape
are stored on one tape. cache
contains files which are currently in the HSM cache.
$ slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow
cached: /arch/bm0146/k204221/iow/INDEX.txt
tape: /arch/bm0146/k204221/iow/iow_data_006.tar
tape: /arch/bm0146/k204221/iow/iow_data5_006.tar /arch/bm0146/k204221/iow/iow_data5_002.tar
tape: /arch/bm0146/k204221/iow/iow_data_001.tar /arch/bm0146/k204221/iow/iow_data3_002.tar /arch/bm0146/k204221/iow/iow_data2_004.tar
tape: /arch/bm0146/k204221/iow/iow_data2_001.tar
tape: /arch/bm0146/k204221/iow/iow_data_002.tar /arch/bm0146/k204221/iow/iow_data5_005.tar /arch/bm0146/k204221/iow/iow_data3_001.tar /arch/bm0146/k204221/iow/iow_data2_003.tar
tape: /arch/bm0146/k204221/iow/iow_data5_003.tar
tape: /arch/bm0146/k204221/iow/iow_data_005.tar /arch/bm0146/k204221/iow/iow_data_004.tar /arch/bm0146/k204221/iow/iow_data5_004.tar /arch/bm0146/k204221/iow/iow_data5_001.tar /arch/bm0146/k204221/iow/iow_data2_002.tar
tape: /arch/bm0146/k204221/iow/iow_data_003.tar /arch/bm0146/k204221/iow/iow_data4_002.tar
tape: /arch/bm0146/k204221/iow/iow_data4_001.tar
tape: /arch/bm0146/k204221/iow/iow_data2_006.tar /arch/bm0146/k204221/iow/iow_data2_005.tar
If we wish to know the unique id of each tape then we use --print-tape-id
. --print-tape-id
exists for historical reasons. The tape barcode (--tape-barcode
) might be more useful for most applications.
$ slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow --print-tape-id
cached: /arch/bm0146/k204221/iow/INDEX.txt
75696: /arch/bm0146/k204221/iow/iow_data_006.tar
75719: /arch/bm0146/k204221/iow/iow_data5_006.tar /arch/bm0146/k204221/iow/iow_data5_002.tar
130870: /arch/bm0146/k204221/iow/iow_data_001.tar /arch/bm0146/k204221/iow/iow_data3_002.tar /arch/bm0146/k204221/iow/iow_data2_004.tar
132453: /arch/bm0146/k204221/iow/iow_data2_001.tar
130873: /arch/bm0146/k204221/iow/iow_data_002.tar /arch/bm0146/k204221/iow/iow_data5_005.tar /arch/bm0146/k204221/iow/iow_data3_001.tar /arch/bm0146/k204221/iow/iow_data2_003.tar
75723: /arch/bm0146/k204221/iow/iow_data5_003.tar
130877: /arch/bm0146/k204221/iow/iow_data_005.tar /arch/bm0146/k204221/iow/iow_data_004.tar /arch/bm0146/k204221/iow/iow_data5_004.tar /arch/bm0146/k204221/iow/iow_data5_001.tar /arch/bm0146/k204221/iow/iow_data2_002.tar
75692: /arch/bm0146/k204221/iow/iow_data_003.tar /arch/bm0146/k204221/iow/iow_data4_002.tar
56317: /arch/bm0146/k204221/iow/iow_data4_001.tar
130879: /arch/bm0146/k204221/iow/iow_data2_006.tar /arch/bm0146/k204221/iow/iow_data2_005.tar
If we wish to know whether a tape is currently available for reading then we do --print-tape-status
. We might also use --details
instead of the other two parameters.
$ slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow --print-tape-id --print-tape-status
cached (AVAILABLE ): /arch/bm0146/k204221/iow/INDEX.txt
75696 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data_006.tar
75719 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data5_006.tar /arch/bm0146/k204221/iow/iow_data5_002.tar
130870 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data_001.tar /arch/bm0146/k204221/iow/iow_data3_002.tar /arch/bm0146/k204221/iow/iow_data2_004.tar
132453 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data2_001.tar
130873 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data_002.tar /arch/bm0146/k204221/iow/iow_data5_005.tar /arch/bm0146/k204221/iow/iow_data3_001.tar /arch/bm0146/k204221/iow/iow_data2_003.tar
75723 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data5_003.tar
130877 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data_005.tar /arch/bm0146/k204221/iow/iow_data_004.tar /arch/bm0146/k204221/iow/iow_data5_004.tar /arch/bm0146/k204221/iow/iow_data5_001.tar /arch/bm0146/k204221/iow/iow_data2_002.tar
75692 (BLOCKED ): /arch/bm0146/k204221/iow/iow_data_003.tar /arch/bm0146/k204221/iow/iow_data4_002.tar
56317 (ERRORSTATE): /arch/bm0146/k204221/iow/iow_data4_001.tar
130879 (AVAILABLE ): /arch/bm0146/k204221/iow/iow_data2_006.tar /arch/bm0146/k204221/iow/iow_data2_005.tar
This output would mean that tape 75692 is blocked because StrongLink currently writes data onto this tape drive. It is blocked for retrievals until the write process is finished. The tape 56317 is in a error state which needs to be solved by the DKRZ or StrongLink support. Please contact support@dkrz.de if you experience this. We regularly check for tapes in an error state but be cannot search for them explicitly.
If we want group_files_by_tape
to directly generate search queries we use --gen-search-query
:
$ slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow --print-tape-id --gen-search-query
cached: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"INDEX.txt"}}]}
75696: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data_006.tar"}}]}
75719: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data5_006.tar|iow_data5_002.tar"}}]}
130870: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data_001.tar|iow_data3_002.tar|iow_data2_004.tar"}}]}
132453: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data2_001.tar"}}]}
130873: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data_002.tar|iow_data5_005.tar|iow_data3_001.tar|iow_data2_003.tar"}}]}
75723: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data5_003.tar"}}]}
130877: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data_005.tar|iow_data_004.tar|iow_data5_004.tar|iow_data5_001.tar|iow_data2_002.tar"}}]}
75692: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data_003.tar|iow_data4_002.tar"}}]}
56317: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data4_001.tar"}}]}
130879: {"$and":[{"path":{"$gte":"/arch/bm0146/k204221/iow","$max_depth":1}},{"resources.name":{"$regex":"iow_data2_006.tar|iow_data2_005.tar"}}]}
The generated search queries are ran directly if --run-search-query
is set. The search_id``s are printed and can be used with ``slk retrieve
(see Run a search query and retrieve search results, above) and/or slk recall
. We might set --full
which does nearly the same as these parameters together.
$ slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow --print-tape-id --run-search-query
cached: 252772
75696: 252762
75719: 252763
130870: 252764
132453: 252765
130873: 252766
75723: 252767
130877: 252768
75692: 252769
56317: 252770
130879: 252771
If we wanted to run a search for each two tapes together, please set --set-max-tape-number-per-search 2
:
$ slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow --print-tape-id --run-search-query --set-max-tape-number-per-search 2
cached: 252779
75696: 252773
75719: 252773
130870: 252774
132453: 252774
130873: 252775
75723: 252776
130877: 252775
75692: 252777
56317: 252777
130879: 252778
Please note that this defines the maximum number of tapes per search but does not force this number. Under certain conditions a search will be only performed for one tape (e.g. tape 75723
in this example). If you which to extract the unique search ids please use awk
and sort -u
as follows:
$ slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow –print-tape-id –run-search-query –set-max-tape-number-per-search 2 | awk ‘{ print $2 }’ | sort -u 252773 252774 252775 252776 252777 252778 252779
The awk
call might change depending on the options you use.
The options --json
and --json-pretty
might be useful if you want to process the output of gfbt
in other tools.
You might then use the script retrieve_slurm_template_search_result_02arguments.sh
from below to run slk retrieve
in a batch job. You can also submit slk retrieve
directly with sbatch
. But, this script will print some additional infos into the SLURM job log:
# retrieval for search id 252765
$ sbatch --account=XY1234 ./retrieve_slurm_template_search_result_02arguments.sh 252765 /my/target/folder
To run slk retrieve
for all search IDs generated by slk_helpers group_files_by_tape
, you can run this script in a loop:
# solution 1: only possible for consecutive search ids:
$ for id in `seq 252762 252772`; do echo "submitting search ${id}"; sbatch --account=XY1234 ./retrieve_slurm_template_search_result_02arguments.sh $id /my/target/folder; done
# solution 2:
$ for id in 252762 252763 252764 252765 252766 252767 252768 252769 252770 252771 252772; do echo "submitting search ${id}"; sbatch --account=XY1234 ./retrieve_slurm_template_search_result_02arguments.sh $id /my/target/folder; done
# solution 3:
# * might take some time if many files are to be search
# * search ids are newly generated each time group_files_by_tape runs
$ for id in `slk_helpers group_files_by_tape -R /arch/bm0146/k204221/iow --run-search-query | awk ' { print $2 } '`; do echo "submitting search ${id}"; sbatch --account=XY1234 ./retrieve_slurm_template_search_result_02arguments.sh $id /my/target/folder; done
Striping#
TLDR;#
We recommend setting striping for your target directory if the files which you want to retrieve are 1 GB in size or larger. This will speed up the copying process from the HSM cache to the Lustre file system. Different striping settings are possible. The setting presented here is quite reasonable for applications at the DKRZ.
First, check if striping is already properly set. This is done with lfs getstripe -d PATH
. In the example below, the folder example01
is not striped and the folder example02
is striped. If the folder is properly striped as example02
then nothing is to do.
$ lfs getstripe -d example01
stripe_count: 1 stripe_size: 1048576 pattern: 0 stripe_offset: -1
$ lfs getstripe -d example02
lcm_layout_gen: 0
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 0
lcme_extent.e_end: 1073741824
stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 1073741824
lcme_extent.e_end: 4294967296
stripe_count: 4 stripe_size: 1048576 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 4294967296
lcme_extent.e_end: EOF
stripe_count: 16 stripe_size: 1048576 pattern: raid0 stripe_offset: -1
If the optiomal striping is not set, you can set it with lfs setstripe ...
as follows:
$ mkdir example03
$ lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M example03
Please check afterwards with lfs getstripe -d exmaple03
if the stripping setting looks like described above. If yes, all files which are written into the folder example03
will be automatically striped as defined. All files which existed in the folder before the lfs setstripe ...
was applied will keep their old striping.
Details#
On a Lustre file system, the content of files is served from so-called object storage targets (OSTs). Lustre allows configuring per file or per folder onto how many OSTs a file’s content is split. This configuration option is denoted as striping
. striping=1
means that the file’s content is served by on OST only. striping=4
(for example) means that the file’s content is split into four distinct parts which are served by four OSTs – one part per OST. The general advantages and disadvantages of striping are not described here. There are numerous online sources on this topic.
The default configuration on the Lustre file system of Levante has still to be decided. For the time being, we suggest users to use a striping setting denoted as progressive file layout. This means that the striping factor is selected automatically depending on the file size. The proposal is to set striping to 1
for files smaller or equal 1 GB
, to set striping to 4
for files between 1 GB
and 4 GB
in size and to set striping to 16
for files larger than 4 GB
. The command to this looks as follows:
$ lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M TARGET_FOLDER
Waiting and processing time of retrievals#
Background#
The number of tape drives limits the number of tapes from which data can be read in parallel. All newly archived data are written onto tapes of the newest available type. All data that have been archived or retrieved since the StrongLink system went online in Oct/Nov 2021 are stored on this tape type. Currently, approximately 20 tape drives for this type of tapes and approximately 50 drives for older tape types are available. When much data are archived and have to be written to tape, more than half of these 20 tape drives may be allocated for writing. A new tape library with additional tapes drives is ordered and is planned to be commissioned in the first half of 2023. Until then, there is a bottleneck for retrieving data which has been archived or accessed in the past year – particularly when much data are archived in parallel. There is no bottleneck with respect to tape drives when data, which have not been touched since Nov 2021, are to be retrieved.
StrongLink considers each process, which accesses a tape, as a job. Each job has a unique ID. A job, which reads from a tape, is denoted as recall job. If a new job comes in and a tape drive is free, StrongLink will start processing this job. New jobs will be queued if all tape drives are allocated to other jobs. This queue is independent of the SLURM queue on Levante. There is not prioritization of jobs. However, jobs are not always processed by the first in first out principle. E.g.: a recall job A
was submitted first, followed by recall job B
, followed by recall job C
. Jobs A
and C
need to recall a file from the same tape. When StrongLink reads this particular tape for job A
it will get the data for job C
as well.
Each recall job can use a limited number of tape drives to read data from tape. Currently (Jan 2023), this value is set to 2. This might change without notification depending on the system load and will not be instantly updated here. Each tape drive of the newest available generation can reach a transfer rate of up to 300 MB/s. Thus, 176 GB of data can be read in 10 minutes when the conditions are optimal. When the data are not stored in the beginning of the tape but somewhere in the middle, the tape drive needs to spool the tape to the appropriate position. This takes time. Additionally, tapes have be taken by a robot arm from the library slot to the tape drive in advance which might take up to one minute.
Check job and queue status#
Check the StrongLink-internal queue via slk_helpers job_queue
:
$ slk_helpers job_queue
total read jobs: 110
active read jobs: 12
queued read jobs: 98
If you run slk retrieve
(and at least one file needs to be read from tape) or slk recall
, the command will print the id of the corresponding recall job to the slk log (~/.slk/slk-cli.log
; 84835
in the example below):
2023-01-12 09:45:10 xU22 2036 INFO Executing command: "recall 275189"
2023-01-12 09:45:11 xU22 2036 INFO Created copy job with id: '84835' for - "recall 275189"
The status of a job is printed via slk_helpers job_status JOB_ID
:
$ slk_helpers job_status 84835
SUCCESSFUL
$ slk_helpers job_status 84980
QUEUED (12)
$ slk_helpers job_status 84981
QUEUED (13)
$ slk_helpers job_status 84966
PROCESSING
The status can be QUEUED ([PLACE_IN_THE_QUEUE])
, PROCESSING
, SUCCESSFUL
, FAILED
or ABORTED
.
Similar as for SLURM jobs we cannot provide the average processing or waiting time of a retrieval/recall job. However, based on the information provided in the Background section above, you can estimate how long the pure retrieval might tape.
Retrieval wrapper for SLURM#
We provide a few wrapper scripts for slk retrieve
and slk recall
as part of the slk
module on Levante. The core wrapper script is called slk_wrapper_recall_wait_retrieve
. The argument --help
prints details on the usage:
$ slk_wrapper_recall_wait_retrieve --help
usage:
slk_wrapper_recall_wait_retrieve <account> <source_path> <destination_path> <suffix_logfile>
useful log files:
slk log file: ~/.slk/slk-cli.log
wrapper log file: rwr_log_<suffix_logfile>.log
<account>
has to be a DKRZ project account with allocated compute time. Your account has to be allowed to run SLURM jobs on Levante.<source_path>
can be a search id, a path pointing to a namespace or a path pointing to a resource. The wrapper script automatically starts recursive recalls and retrievals. However, it does not split the files by tape. If you wish to combine this wrapper withslk_helpers group_files_by_tape
please have a look into this example.<destination_path>
destinatino path of the retrieval.<suffix_logfile>
: The script outmatically created a log filerwr_log_<suffix_logfile>.log
into which relevant output from this script and from child scripts is written.
What does this script do? If the files do not need to be recalled but are stored in the HSM cache, a retrieval is directly started. Otherwise, it submits a SLURM job, which runs slk recall
. The ID of the StrongLink recall job is extracted and a new “waiter job” is submitted to SLURM which is delayed by one hour. After one hour this “waiter job” starts and checks the status of the recall job with the given ID. If the recall job …
… was successful, a retrieval job is started to copy the file(s) from HSM cache to the Lustre filesystem.
… failed, error information is printed to the log file and the script terminates.
… is still running or queued, the waiter job submits itself again with another hour of delay.
Note
If the StrongLink system is under high load and many timeout occur, the slk-recall-wait-retrieve wrapper might fail in between.
Retrieval script templates#
Several script templates for different use cases are printed below and available for download:
several retrievals of single files:
retrieve_slurm_template_single_files.sh
one recursive retrieval of a whole namespace:
retrieve_slurm_template_recursive.sh
search and retrieval of search results:
retrieve_slurm_template_search.sh
auto-generate a search query, run it and retrieve search results:
retrieve_slurm_template_gen_search.sh
script with input arguments: insert search id and target folder:
retrieve_slurm_template_search_result_02arguments.sh
When you use these templates, you need to make a few adaptions:
replace
/work/xz1234/ex/am/ple
by the actual target folder on the Lustre file systemreplace
xz1234
in--account=xz1234
by your project account namereplace
/path/to/your/archived/
by the namespace path to your data on the HSMreplace
/arch/bm0146/k204221/.*.nc
by what you want to search for (onlyretrieve_slurm_template_gen_search.sh
); Note:.*
is a regular expression (RegEx) and no wildcard/glob. The search works only with RegEx and not with wildcards.
Please run/submit these scripts via sbatch
as described in Run slk as batch job and SLURM Introduction.
several archivals of single files#
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_retr_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --account=xz1234 # Charge resources on this project
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set target folder for retrieval
target_folder=/work/xz1234/ex/am/ple
# create folder to retrieve into (target folder)
mkdir -p ${target_folder}
# set striping for target folder
# see https://docs.dkrz.de/doc/hsm/striping.html
# ON LEVANTE
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M ${target_folder}
# ON MISTRAL
#lfs setstripe -S 4M -c 8 ${target_folder}
# ~~~~~~~~~~~~ retrievals ~~~~~~~~~~~~
# do the retrieval
echo "doing 'slk retrieve'"
# ~~~~~~~~~~~~ doing single-file retrievals ~~~~~~~~~~~~
# You can do multiple retrievals in one script, but based on our
# experience, 10 to 15 separate retrievals are enough to cover the
# 08:00 hours maximum time for running the slurm jobs. Therefore, it
# is recommended to run your retrieval scripts as follows:
# first retrieval and capture exit code (get $? in line after slk command)
slk retrieve /path/to/your/archived/file1 ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk retrieve call"
else
echo "retrieval successful"
fi
# second retrieval and capture exit code (get $? in line after slk cmd)
slk retrieve /path/to/your/archived/file2 ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk retrieve call"
else
echo "retrieval successful"
fi
# ...
# ...
# fifteenth retrieval and capture exit code (get $? in line after slk cmd)
slk retrieve /path/to/your/archived/file15 ${target_folder}
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk retrieve call"
else
echo "retrieval successful"
fi
one recursive retrieval of a whole namespace#
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_retr_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --account=xz1234 # Charge resources on this project
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set target folder for retrieval
target_folder=/work/xz1234/ex/am/ple
# create folder to retrieve into (target folder)
mkdir -p ${target_folder}
# set striping for target folder
# see https://docs.dkrz.de/doc/hsm/striping.html
# ON LEVANTE
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M ${target_folder}
# ON MISTRAL
#lfs setstripe -S 4M -c 8 ${target_folder}
# ~~~~~~~~~~~~ doing recursive retrievals ~~~~~~~~~~~~
# If you wish to retrieve all files from a specific folder, you can use
# the recursive option (-R). Please use this option only of you need more
# than 90% of the files in a certain namespace directory, and don't
# retrieve a whole directory of e.g. 500 files for only 20 files. an
# example for recursive retrieval is as follows:
slk retrieve -R /path/to/your/archived/directory ${target_folder}
# '$?' captures the exit code of the previous command (you can put it in
# the next line after each slk command).
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk retrieve call"
else
echo "retrieval successful"
fi
search and retrieval of search results#
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_retr_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --account=xz1234 # Charge resources on this project
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set target folder for retrieval
target_folder=/work/xz1234/ex/am/ple
# create folder to retrieve into (target folder)
mkdir -p ${target_folder}
# set striping for target folder
# see https://docs.dkrz.de/doc/hsm/striping.html
# ON LEVANTE
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M ${target_folder}
# ON MISTRAL
#lfs setstripe -S 4M -c 8 ${target_folder}
# ~~~~~~~~~~~~ doing the search ~~~~~~~~~~~~
# Set up a search that finds the files which you want to retrieve and
# and capture the resulting search id. See this FAQ entry for alternatives
# to capture the search id: "Can the search ID of slk search be captured
# by a shell variable?". We do this in two steps in order to be able to
# capture the exit code of the search command.
search_id_raw=`slk search '{"$and": [{"path": {"$gte": "/path/to/your/archived"}}, {"resources.name": {"$regex": "out_data_00[256].tar"}}]}'`
# '$?' captures the exit code of the previous command (you can put it in
# the next line after each slk command).
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk search call"
else
echo "search successful"
fi
search_id=`echo $search_id_raw | tail -n 1 | sed 's/[^0-9]*//g'`
echo "The search ID is ${search_id}"
#
# If we delimit the search query by `"` instead of `'` and escape all
# `$` and `"` in the query, then we might even use environment variables
# in the search query
#
# source_namespace=/path/to/your/archived
# slk search "{\"\$and\": [{\"path\": {\"\$gte\": \"${source_namespace}\"}}, {\"resources.name\": {\"\$regex\": \"out_data_00[256].tar\"}}]}"
#
# ~~~~~~~~~~~~ doing recursive retrievals ~~~~~~~~~~~~
# If you wish to retrieve a set of files that have been found by a search
# you can use "slk retrieve SEARCH_ID TARGET_FOLDER"
slk retrieve ${search_id} ${target_folder}
# '$?' captures the exit code of the previous command (you can put it in
# the next line after each slk command).
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk retrieve call"
else
echo "retrieval successful"
fi
generate search string and retrieve files#
Download: retrieve_slurm_template_gen_search.sh
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_retr_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --account=xz1234 # Charge resources on this project
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set target folder for retrieval
target_folder=/work/xz1234/ex/am/ple
# create folder to retrieve into (target folder)
mkdir -p ${target_folder}
# set striping for target folder
# see https://docs.dkrz.de/doc/hsm/striping.html
# ON LEVANTE
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M ${target_folder}
# ON MISTRAL
#lfs setstripe -S 4M -c 8 ${target_folder}
# ~~~~~~~~~~~~ doing the search ~~~~~~~~~~~~
# !!! NOTE: '.*.' is a regular expression and no bash wildcard / glob !!!
search_query=`slk_helpers gen_file_query '/arch/bm0146/k204221/.*.nc'`
echo "The search query is ${search_query}"
search_id=$(eval "slk search '"${search_query}"' | tail -n 1 | cut -c12-20")
echo "The search ID is ${search_id}"
#
# If we delimit the search query by `"` instead of `'` and escape all
# `$` and `"` in the query, then we might even use environment variables
# in the search query
#
# source_namespace=/path/to/your/archived
# slk search "{\"\$and\": [{\"path\": {\"\$gte\": \"${source_namespace}\"}}, {\"resources.name\": {\"\$regex\": \"out_data_00[256].tar\"}}]}"
#
# ~~~~~~~~~~~~ doing recursive retrievals ~~~~~~~~~~~~
# If you wish to retrieve a set of files that have been found by a search
# you can use "slk retrieve SEARCH_ID TARGET_FOLDER"
slk retrieve ${search_id} ${target_folder}
# '$?' captures the exit code of the previous command (you can put it in
# the next line after each slk command).
if [ $? -ne 0 ]; then
>&2 echo "an error occurred in slk retrieve call"
else
echo "retrieval successful"
fi
script with input arguments: insert search id and target folder#
Use cases for this script are presented here.
Download: retrieve_slurm_template_search_result_02arguments.sh
Run as:
sbatch --account=XY1234 ./retrieve_slurm_template_search_result_02arguments.sh SEARCH_ID /TARGET_FOLDER
Script:
#!/bin/bash
# HINT:
# * You can change the values right of the "=" as you wish.
# * The "%j" in the log file names means that the job id will be inserted
#SBATCH --job-name=test_slk_retr_job # Specify job name
#SBATCH --output=test_job.o%j # name for standard output log file
#SBATCH --error=test_job.e%j # name for standard error output log
#SBATCH --partition=shared # partition name
#SBATCH --ntasks=1 # max. number of tasks to be invoked
#SBATCH --time=08:00:00 # Set a limit on the total run time
#SBATCH --mem=6GB
# make 'module' available when script is submitted from certain environments
source /sw/etc/profile.levante
# check if proper number of arguments is supplied
if [[ "$#" -ne 2 ]]; then
>&2 echo "need two input arguments: SEARCH_ID TARGET_LOCATION"
exit 2
fi
# ~~~~~~~~~~~~ preparation ~~~~~~~~~~~~
module load slk
# set search id for retrieval
search_id=$1
# set target folder for retrieval
target_folder=$2
# create folder to retrieve into (target folder)
mkdir -p ${target_folder}
# set striping for target folder
# see https://docs.dkrz.de/doc/hsm/striping.html
# ON LEVANTE
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M ${target_folder}
# ~~~~~~~~~~~~ validation part 1 ~~~~~~~~~~~~
echo "~~~~~~~~~~~~~~~~~~~~~~~~ validation part 1 ~~~~~~~~~~~~~~~~~~~~~~~~"
echo "search id: ${search_id}"
echo "target folder: ${target_folder}"
echo "these files will be retrieved:"
slk list ${search_id}
echo ""
# ~~~~~~~~~~~~ retrievals ~~~~~~~~~~~~
# do the retrieval
echo "~~~~~~~~~~~~~~~~~~~~~~~~ retrieval ~~~~~~~~~~~~~~~~~~~~~~~~"
echo "starting: 'slk retrieve -s ${search_id} ${target_folder}'"
echo " existing files will be skipped"
slk retrieve -s ${search_id} ${target_folder}
exit_code=$?
if [ ${exit_code} -ne 0 ]; then
>&2 echo "an error occurred in slk retrieve call"
else
echo "retrieval successful"
fi
echo ""
echo "~~~~~~~~~~~~~~~~~~~~~~~~ validation part 2 ~~~~~~~~~~~~~~~~~~~~~~~~"
echo "last 10 lines of the slk log file:"
tail ~/.slk/slk-cli.log -n 10
exit ${exit_code}