Known Issues (read this!)¶
file version: 19 April 2022
current slk version: 3.3.21
slk issues on Levante¶
We are experiancing two issues on Levante that affect the usage of slk
. You cannot do anything about them but just run slk
again a few seconds later.
slk/slk_helpers terminate directly after start with “Name or service not known” or “Unhandled error occurred”¶
The error, which the `slk_helpers
print to the command line is
archive.dkrz.de: Name or service not known
The error, which slk
prints to the command line is
ERROR: Unhandled error occurred, please check logs
However, if you take a look into the slk
log (~/.slk/slk-cli.log
) then you’ll find the same error message as the slk_helpers
display:
2022-04-07 21:45:30 ERROR archive.dkrz.de: Name or service not known
The error is the same. The slk
seams to have issues getting the proper routing to the StrongLink constellation.
slk terminates while running with “A fatal error has … Java …” and “SIGBUS”¶
This SIGBUS
-related error does not only occurr when slk is used but also when other programs are called on Levante. ATOS is investigating it.
slk archive/retrieve may use much memory and CPU time – careful with parallel slk calls on one node¶
slk archive
and slk retrieve
are fast but memory hungry. Additionally, these commands run many threads in parallel when large files or many files are transferred. Running many of such slk
calls in parallel on one node as one user (a) uses up a lot of memory and (b) causes issues in the thread management. As a rule of thumb, 6 to 8 GB of memory should be assumed for each slk
call. However, that exact memory need depends on the amount of data that are archived / retrieved by slk.
Be aware that on most levante nodes, 2 GB of memory is allocated to each physical CPU core. This limited is enforced by the operating systems and processes exceeding the allowed memory usage will be killed. Thus, one should at least request three physical CPU cores for major archival or retrieval tasks on Lenvate.
slk is hanging / unresponsive¶
A slk
call runs for some time (even a few hours) and nothing seems to happen. There are several possible reasons for this.
lustre file system is hanging¶
Please check whether /home
is hanging. If /home
is hanging, slk
cannot access its login token and cannot write into its log. Therefore, slk
hangs when /home
is hanging.
slk retrieve does not hang but the tape recall takes very long¶
When many retrieve/recall requests of files from tape are processed, the individual calls of slk retrieve
might take longer than normal.
one or more source files have 0 byte size¶
Please check whether you are archiving a file of 0 Byte size. slk archive
and slk retrieve
hang when such a file is archive or retrieved, respectively.
slk retrieve hangs due to read-write access on one tape¶
When data is written from the HSM-cache to a tape and a slk retrieve
call targets a file on the same tape then the slk retrieve
calls hangs until it is killed. StrongLink does not write onto HPSS tapes. Thus, this issue can only arise if files are retrieved, which (a) have been archived after 1 November 2021 or (b) have been retrieved at least once since 1 November 2021. The latter is the case because files retrieved via slk from HPSS tapes are automatically written to new StrongLink tapes.
Running commands without -R¶
Non-recursiveness is interpreted differently in StrongLink than defined in POSIX. If a namespace/directory (not a file) is given as input to the commands slk archive
, slk retrieve
and slk tag
, all files in this namespace/directory are affected. In contrast, cp
and rm
would throw an error that -r
is missing. When -R
is set, all sub-namespaces are also affected.
slk writes no output in non-interactive mode¶
All slk commands except for slk list
do not print output to the stdout and stderr streams (== command line output) when they are in non-interactive mode – i.e. running in SLURM jobs. Please catch the exit codes of your slk archive
call and check whether they are equal 0
. If not, an error occurred. Details on the error can be found in the slk log file ~/.slk/slk-cli.log
. However, when you run many slk
commands in parallel, the slk log becomes hard to read. Please print the time stamp (i.e. via date
) when the error occurred to be able to find the details in the slk log later on. See the next code block on how to do this.
The exit code of the previous program call is stored in $?
. Example:
$ slk archive /work/project/user/data /ex/am/ple/blub
...
$ echo $?
0
# or 1 or higher
In a bash/batch script it could look like this:
# ...
slk archive /work/project/user/data /ex/am/ple/blub
exit_code=$?
# print exit code with prefix so that it is easy to `grep`
echo "exit code: ${exit_code}"
if [ ${exit_code} -ne 0 ]; then
# print date
date
fi
slk never writes to stderr¶
Error output of slk is written to the stdout
stream instead of the stderr
stream. If slk output in non-interactive mode was activated (it is not!) then you would find all error output in the SLURM stdout (not stderr) file when running jobs on mistral.
difference: slk move and slk rename¶
The Linux mv
can move and rename files. The slk move
can just move files/namespaces from one namespace to another namespace. Renaming can only be performed by slk rename
. Both commands can only target one file/namespace at a time. Wildcards are not supported.
slk archive compares file size and timestamp prior to overwriting files¶
slk archive
compares file size and timestamp to decide whether to overwrite a file or not. rsync
does it the same way. There might be rare situations when an archived file should be overwritten by another file with the same name, size and timestamp: this would fail.
Availability of archived data and modified metadata might be delayed by a few seconds¶
StrongLink is a distributed system. Metadata is stored in a distributed metadata database. Some operations might take a few seconds until their results are visible because they have to be synchronized amongst different nodes.
Please wait a few seconds before you retrieve a file that was just archived.
A file listed by slk list is not necessarily available for retrieval yet¶
The location, name and size of a file are metadata. These metadata are written into the StrongLink metadata database when an archival process starts. slk list
only prints metadata. Hence, if slk list
lists a file, which is e.g. part of a file set currently uploaded in a batch job, this file is not necessarily fully uploaded yet. Similarly, aborted slk archive
calls can produce a file’s metadata entry without correct data. Such a file can be retrieved without error. Please see failed or canceled slk archive and slk retrieve calls leave file fragments for details on file fragments.
failed or canceled slk archive and slk retrieve calls leave file fragments¶
issues during archival¶
A file fragment remains in StrongLink if slk archive
did not terminate properly during an archival process. Metadata is available for this file fragment and it can be retrieved. It has no checksum. The latter is because some metadata – like checksums – will be written after the archival process has finished successfully. The existence of checksums can be checked via slk_helpers checksum GNS_PATH
. In the case of netCDF files, the header section might be copied properly. Thus, an ncdump -h
might be successfully applied on a file fragment.
These fragments might occur when a user aborts slk archive
(CTRL + C), a ssh connection breaks or a SLURM job is killed due to a timeout. More than one file might be affected because multiple files can be archived in parallel.
issues during retrieval¶
If slk retrieve
does not terminate properly during a retrieval process, a file fragment might be created. These file fragments of temporary file names containing the original FILENAME
: ~FILENAME14620203101828317173.slkretrieve
. The reasons for improper termination of slk retrieve
are the same as for slk archive
. More than one file might be affected because multiple files can be retrieved in parallel.
Commonly, a file was correctly retrieved when it has its original filename and when the exit code of slk retrieve
is 0
(echo $?
directly after retrieval). To be 100% sure that the files was correctly retrieved, you can compared the checksum of the retrieved file with the checksum stored in StrongLink. If there is no checksum stored in StrongLink, the source file already is incomplete.
Pagination mode of slk list¶
When slk list
is used in interactive mode without piping its output into another command, it will print its output in “pagination mode”. This means that only 25 results are printed “per page” and the user has to “turn the page” manually by pressing Return
/Enter
. Turning a page back is not possible. Even if there are less than 25 result, pagination mode is entered and the user has to type Return
/Enter
to leave the pagination mode. When a user regularly leaves the pagination mode, the terminal is cleared as CTRL + L does. This behavior is by design and cannot be changed. If one wants to avoid the terminal to be cleared or does not want to browse through 30 pages, one should abort slk list
with CTRL + C. We recommend to use slk list
in combination with cat
, less
, more
or similar tools in order to avoid the pagination mode. Below you will find an example.
Please note that the output of slk list NAMESPACE
and slk list NAMESPACE | cat
differs in the last line. This might be important when you create scripts around slk list
.
slk list
in pagination mode:
$ slk list /k204221_test
drwxrwxrwx- k204221 bm0146 24 Jun 2021 20210624_test
drwxrwxrwx- k204221 bm0146 25 Jun 2021 20210625_test
drwxrwxrwx- k204221 bm0146 22 Jun 2021 abc
drwxrwxrwx- k204221 bm0146 24 Jun 2021 blubber
drwxrwxrwx- k204221 bm0146 22 Jun 2021 defg
drwxrwxrwx- k204221 bm0146 22 Jun 2021 memory_issue_testing
drwxrwxrwx- k204221 bm0146 22 Jun 2021 sbds_test_data
drwxrwxrwx- k204221 bm0146 22 Jun 2021 sbds_test_data_b
drwxrwxrwx- k204221 bm0146 22 Jun 2021 test
drwxrwxrwx- k204221 ka1209 22 Jun 2021 test_20210617
drwxrwxrwx- k204221 bm0146 22 Jun 2021 test_20210622
drwxrwxrwx- k204221 ka1209 22 Jun 2021 testing
Files 1-12 of 12
Avoid pagination mode of slk list
:
$ slk list /k204221_test | cat
drwxrwxrwx- k204221 bm0146 24 Jun 2021 20210624_test
drwxrwxrwx- k204221 bm0146 25 Jun 2021 20210625_test
drwxrwxrwx- k204221 bm0146 22 Jun 2021 abc
drwxrwxrwx- k204221 bm0146 24 Jun 2021 blubber
drwxrwxrwx- k204221 bm0146 22 Jun 2021 defg
drwxrwxrwx- k204221 bm0146 22 Jun 2021 memory_issue_testing
drwxrwxrwx- k204221 bm0146 22 Jun 2021 sbds_test_data
drwxrwxrwx- k204221 bm0146 22 Jun 2021 sbds_test_data_b
drwxrwxrwx- k204221 bm0146 22 Jun 2021 test
drwxrwxrwx- k204221 ka1209 22 Jun 2021 test_20210617
drwxrwxrwx- k204221 bm0146 22 Jun 2021 test_20210622
drwxrwxrwx- k204221 ka1209 22 Jun 2021 testing
Files: 12
slk tag cannot be applied on individual files¶
slk tag
cannot be applied on individual files but only on namespaces. If it is applied on a namespace, all files in this namespace are assigned the metadata provided in the slk tag
call. The namespace itself does not get any metadata assigned. If -R
is set, also all files in sub-namespaces are assigned the metadata.
slk does not have a –version flag¶
Instead, it has a version command: slk version
Update interval of progress bars (slk archive, group, owner, retrieve, tag)¶
Progress bars are updated per file or per block of n
files. If you archive a folder with three files of 99 GB, 550 MB and 450 MB size, you will not see any updates of the progress bar 99% of the archival time while the large 99 GB file is archived and the progress bar will jump from 0% to 99%. If you tag a few files, the process bar will remain at 0% for a long time and suddenly jump to 100%.
Using slk list to print search results¶
slk list
prints only the file names – independent on whether we print the content of a namespace or the result of a search. However, a search might find files in arbitrary namespaces. Thus, it would be helpful to print the path/namespace of each file when search results are listed. This is not the case. Currently, you cannot find out in which namespace(s) your search results are located in.
slk performance on different node types¶
We suggest running slk archive
and slk retrieve
on the mistralpp
and compute
/compute2
nodes. The run time on the mistralpp
nodes considerably depends on the activity of other users on these nodes.
Please do not run slk archive
and retrieve
on the mistral login nodes (mlogin10X
) when you archive large amounts of data because slk
causes high CPU load and uses much memory.
The available memory per job on the shared
nodes is very low. Therefore, slk archive
and slk retrieve
are slower than on other nodes. The run time can be expected to be two to four times as long as on the mistralpp
and compute
/compute2
nodes.
group memberships of user updated on login¶
If a user is added to a new group/project, this information is not automatically passed to StrongLink. Instead, the user has to run slk login
again. Background: StrongLink caches LDAP data of each user and only updates its cache on a new login.
LDAP user not known to StrongLink prior to first login¶
If a user never logged in to StrongLink, his/her user will not exist in StrongLink (i.e. chown to this user is not possible). Background: There are many users listed in the DKRZ LDAP that will never access StrongLink. Keeping all these users in the StrongLink user database is not reasonable.
slk retrieve does not overwrite files but creates duplicates¶
When a file already exists, it retrieves a copy and inserts .DUPLICATE_FILENAME.[ID].[VERSION]
between name and extension of the file. However, slk retrieve
will overwrite these DUPLICATE
files without warning. Consecutive retrievals will overwrite this file even if it is modified.
VERSION
indicates the file version in StrongLink. If you modify a file and archive it a second time, the version will be incremented by one. Commonly, the version is not visible to you. Old file versions are not kept. Metadata of old versions is partly kept.
Do not archive such a DUPLICATE
file because it might overwrite itself during retrieval.
“slk retrieve /source/ /target” and “slk retrieve /source /target” are not the same¶
slk retrieve
works the same as rsync
with respect to a /
appended to the source path.
With /
appended to the source path:
$ ls /ex/am/ple/bm0146/k20422/dm/retrieve_us
test.txt
$ slk retrieve -R /ex/am/ple/bm0146/k20422/dm/retrieve_us/ .
...
$ ls .
test.txt
Without /
in the end of the source path:
$ ls /ex/am/ple/bm0146/k20422/dm/retrieve_us
test.txt
$ slk retrieve -R /ex/am/ple/bm0146/k20422/dm/retrieve_us .
...
$ ls .
retrieve_us
$ ls ./retrieve_us
test.txt
slk group does not print visible error messages when they fail¶
Short version¶
The progress bar of slk group
does not properly print the full number of files to modify. There will be always printed Files changed: n-1/n
or Files changed: n/n
with increasing n
over time. When the slk group
call stops working due to an internal error, the user does not know when the currently printed number of modified files n
is the number of all available files. Hence, it is important either to capture the exit code of slk group
or to have a look into the slk log (~/.slk/slk-cli.log
) afterwards.
Long Version¶
When slk group
are recursively applied to a folder with many files in it, the slk commands already start modifying first files while StrongLink is still collecting files. The progress bar will show 99%
to 100%
during the whole time while the file count will raise:
$ slk group -R 200524 /ex/am/ple/bm0146/k20422/dm/group_example
[========================================|] 100% complete. Files changed: 10/10, [150M/150M].
[========================================|] 100% complete. Files changed: 11/11, [152M/152M].
[========================================|] 100% complete. Files changed: 19/19, [214M/214M].
...
If some file cannot be modified, this is indicated as follows:
$ slk group -R 200524 /ex/am/ple/bm0146/k20422/dm/group_example
[========================================|] 100% complete. Files changed: 15426/15583, [7.9T/8.0T]. Files failed: 157.
But, when slk group
finishes we do not know if all possible files were modified or if slk group was stopped in between (see next example):
$ slk group -R 200524 /k204221_test/testing/stability_20211012_size_500mb_40
[========================================|] 100% complete. Files changed: 15426/15583, [7.9T/8.0T]. Files failed: 157.
$ slk group -R 200524 /k204221_test/testing/stability_20211012_size_500mb_40
[=======================================/] 100% complete. Files changed: 31204/31227, [16.0T/16.1T]. Files failed: 157.
Both slk group
were applied on the same folder. Therefore, the number of modified files should be the same – but, it is not. The reason for this discrepancy is that the first slk group
command stopped with exit code 1
after 15583 files. Hence, it is important either to capture the exit code of slk group
or to have a look into the slk log (~/.slk/slk-cli.log
) afterwards.
slk archive might create namespaces with “.” and “..” as names but slk retrieve interpretes them¶
.
and ..
will be considered as normal names of namespaces in StrongLink. slk move
and slk rename
prevent the usage of .
and ..
(and moving into these). However, slk archive
does not prevent this yet. The examples below should clarify this.
When namespaces with names .
and ..
are retrieved, these names are interpreted by the shell.
# create source data
$ mkdir none dot
$ echo "none" > none/a.txt
$ echo "." > dot/a.txt
# archival
$ slk archive none/a.txt /ex/am/ple/
[========================================\] 100% complete. Files archived: 1/1, [5B/5B].
$ slk archive dot/a.txt /ex/am/ple/.
[========================================-] 100% complete. Files archived: 1/1, [2B/2B].
# see what was archived
$ slk list /ex/am/ple | cat
drwxrwx---- stronglink group0 10 Nov 2021 .
-rw-r--r--- stronglink group0 5 10 Nov 2021 a.txt
Files: 2
$ slk list /ex/am/ple/. | cat
-rw-r--r--- stronglink group0 2 10 Nov 2021 a.txt
# retrieve top folder recursively
$ slk retrieve -R /ex/am/ple retr_overwrite_20211109_a
[========================================|] 100% complete. Files retrieved: 2/2, [7B/7B].
# check what is there
$ ls -la retr_overwrite_20211109_a/overwrite_20211109_a/
total 9
drwxr-xr-x 2 k204221 bm0146 4096 Nov 10 00:12 .
drwxr-xr-x 3 k204221 bm0146 4096 Nov 10 00:12 ..
-rw------- 1 k204221 bm0146 2 Nov 10 00:12 a.DUPLICATE_FILENAME.52933184010.1.txt
-rw------- 1 k204221 bm0146 5 Nov 10 00:12 a.txt
slk bad_input returns exit code 0¶
slk BAD_INPUT
(like slk acrhvie
) prints the help and returns a 0
as exit codes. It is said to print exit code 0
because the help is printed successfully. However, it should be 1
or higher.
slk cannot handle a path with // (double slash)¶
slk does not substitute //
by /
. Instead, it creates or looks for a namespace with an empty string as name (//
=> /
+ empty string + /
). Empty strings as names for namespaces are prohibited. Therefore, commands fail when there is a //
in a file path.
“Login Unsuccessful - Incorrect Credentials” or “Session key has expired” when StrongLink not available¶
Issue description¶
The error message WARNING: Session key has expired. Please login again:
(interactive usage) or ERROR Session key has expired, unable to login in non-interactive mode
(non-interactive usage; e.g. in SLURM batch script) is printed but the session key is not yet 30 days old.
or
You log to the slk with the correct login credentials but get the error Login Unsuccessful - Incorrect Credentials
.
Reason¶
slk
might print wrong errors when not connection to the StrongLink instance is possible or when the StrongLink instance is in a bad state – e.g. also during a scheduled maintenance. The errors differ between interactive and non-interactive mode.
All slk
commands in interactive mode except for slk login
:
$ slk group ka1209 /k204206_test/ECHAM3_T42_22056HMBG_APRL.1-174.sellonlatbox_28.3_33.3_55.967_62.967.grb
WARNING: Session key has expired. Please login again:
Username [k204221]: ^C
slk login
:
Login Unsuccessful - Incorrect Credentials
slk
commands in non-interactive mode:
ERROR Session key has expired, unable to login in non-interactive mode
Filtering slk list results with “*”¶
use * to replace parts of the file name¶
This works fine:
$ slk list /ex/am/ple/\*.nc
...
$ slk list '/ex/am/ple/*.nc'
...
The user needs to prevent that *
is interpreted by the bash/ksh/… . This can be done by one of both approaches above.
escape * to print the content of a namespace containing * in its name¶
Assuming, we have a namespaces with the name *
, which is allowed, then we might do this to its content:
$ slk list '/ex/am/ple/\*'
...
This will prevent slk list
successfully from interpreting the *
. However, when a *
is in the path, slk list
automatically goes into “filter mode”. This means that the content of the namespace /ex/am/ple
will be filtered for content with the name *
. Hence, we will just get *
printed and not its content.
using * to replace parts of namespace names¶
Using *
to replace parts of the names of namespaces does not work. Example:
$ slk list /ex/am/ple/\*/\*.nc
...
$ slk list '/ex/am/ple/*/*.nc'
...
These two list commands will look for *.nc
in /ex/am/ple
and not in every sub-namespace of /ex/am/ple
.
slk chmod -R modifies many more file permissions than it should¶
slk chmod -R
creates a tree of all files and of all namespaces in which these files are located. slk chmod -R
seems to iterated the tree in a wrong way so that each files’ permissions are not modified once but 2^[namespace_depth - 1]
times.
example 1¶
$ echo "abc" > test.txt
$ slk archive test.txt /ex/am/ple/ex1/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
[========================================/] 100% complete. Files archived: 1/1, [...].
# that's OK
$ slk chmod -R 755 /ex/am/ple/ex1/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
[========================================\] 100% complete. Files changed: 1/1, [4B/4B].
# that's not OK
$ slk chmod -R 755 /ex/am/ple/ex1
^C ^C===========\] 100% complete. Files changed: 4431/4431, [...].
example 2¶
# archive five files into one parent parent
$ slk archive *.nc /ex/am/ple/ex2a
[========================================/] 100% complete. Files archived: 5/5, [10.8K/10.8K].
$ slk chmod -R 755 /ex/am/ple/ex2a
[========================================\] 100% complete. Files changed: 6/6, [10.8K/10.8K].
# ================>>>>>>>>>>>> SIX RESOURCES MODIFIED <<<<<<<<<<<<================
# archive five files into three sub-namespaces:
$ slk archive *.nc /ex/am/ple/ex2b/d1/d2/d3
[========================================-] 100% complete. Files archived: 5/5, [10.8K/10.8K].
$ slk chmod -R 755 /ex/am/ple/ex2b
[========================================|] 100% complete. Files changed: 55/55, [86.3K/86.3K].
# ================>>>>>>>>>>>> FIFTY-FIVE RESOURCES MODIFIED <<<<<<<<<<<<================
example 3¶
echo "abc" > test.txt
# no subfolder; n=0
slk archive test.txt /ex/am/ple/test01
slk chmod -R /ex/am/ple/test01
# => 2 resources (1x file, 1x namespace)
# subfolder; n=1
slk archive test.txt /ex/am/ple/test01/test02
slk chmod -R /ex/am/ple/test01
# => 5 resources (2x same file, 3x namespaces: 2x test02, 1x test01)
# subfolder in subfolder; n=2
slk archive test.txt /ex/am/ple/test01/test02/test03
slk chmod -R /ex/am/ple/test01
# 11 resources (4x same file, 7x namespaces: 4xtest03, 2x test02, 1x test01)
# subfolder in subfolder in subfolder; n=3
slk archive test.txt /ex/am/ple/test01/test02/test03/test04
slk chmod -R /ex/am/ple/test01
# 23 resources (8x same file, 15x namespaces: 8xtest04, 4xtest03, 2x test02, 1x test01)
# ... n ...
...
# 2^n * FILES + 2^(n+1) - 1 resources => 2^n times each file;2^(n+1)-1 namesspaces
How to search non-recursively in a namespace¶
slk search
cannot search non-recursively in a namespace provided via path
. As workaround, please get the object id of the particular namespace via slk_helpers exists
and, then in your search query, use it as value for the search field resources.parent_id
(see slk Usage Examples)
Terminal cursor disappears if slk command with progress bar is canceled¶
If a slk command with a progress bar is canceled by the user, the shell cursor might disappear. One can make it re-appear by (a) running reset
or (b) starting vim
and leaving it directly (:q!
).
error “conflict with jdk/…” when the slk module is loaded¶
slk
needs a specific Java version that is automatically loaded with slk
. Having other Java versions loaded in parallel might cause unwanted side effects. Therefore, the system throws an error message and aborts.
slk needs a specific Java version¶
You might encounter an error like this:
$ slk list 12
CLI tools require Java 13 (found 1)
slk
needs a specific Java version. This Java version is automatically loaded when we load the slk module. If you have another Java loaded explicitly, please unload them prior to loading the slk module. If you loaded slk already, please: (1) unload slk, (2) unload all Java modules and (3) load slk again.
h
slk search yields RQL parse error¶
ERROR: Search failed. Reason: RQL parse error: No period found in collection field name ().
Either: Please consider using '
around your search query instead of "
to prevent operators starting with $
to be evaluated as bash variables.
Or: Please escape $
’s belonging to query operators when you use "
as delimiters of the query string.
slk login asks me to provide a hostname and/or a domain¶
If you are asked for this information the configuration is faulty. Please contact support@dkrz.de and tell us on which machine you are working.
Archival fails and Java NullPointerException in the log¶
This error message is printed in the log:
2021-07-13 08:33:03 ERROR Unexpected exception
java.lang.NullPointerException: null
at com.stronglink.slkcli.api.websocket.NodeThreadPools.getBestPool(NodeThreadPools.kt:28) ~[slk-cli-tools-3.1.62.jar:?]
at com.stronglink.slkcli.archive.Archive.upload(Archive.kt:191) ~[slk-cli-tools-3.1.62.jar:?]
at com.stronglink.slkcli.archive.Archive.uploadResource(Archive.kt:165) ~[slk-cli-tools-3.1.62.jar:?]
at com.stronglink.slkcli.archive.Archive.archive(Archive.kt:77) [slk-cli-tools-3.1.62.jar:?]
at com.stronglink.slkcli.SlkCliMain.run(SlkCliMain.kt:169) [slk-cli-tools-3.1.62.jar:?]
at com.stronglink.slkcli.SlkCliMainKt.main(SlkCliMain.kt:103) [slk-cli-tools-3.1.62.jar:?]
2021-07-13 08:33:03 INFO
This error indicates that there is an API issue. A reason might be that one or more StrongLink nodes went offline and the other nodes did not take of their connections yet. Please notify support@dkrz.de if you experience this error.
slk login ERROR: Unhandled error occurred, please check logs¶
The error message printed in the log starts with:
2022-03-25 14:39:50 ERROR No transformation found: class io.ktor.utils.io.ByteBufferChannel -> ...
status: 200 OK
response headers:
...
When the error occurrs¶
You run slk login
, misspell your password on the first try and provide the correct password on the second or later try.
Solving the error¶
Run slk login
a second time.
permissions of retrieved files are “rw——-” although umask is set differently¶
slk retrieve
ignores umask
and partly ignores ACLs (via setfacl
). Instead, it always sets rw-------
.
While slk retrieve
is copying a file, nobody should interact with this file. Therefore, no read/write/execute permissions are granted to other users than the owner. After the retrieval is finished, the permissions should be updated according to umask and ACLs. However, this is not done.
slk archive: Exception …: lateinit property websocket has not been initialized¶
Full error message on the command line:
Exception in thread "Thread-357" kotlin.UninitializedPropertyAccessException: lateinit property websocket has not been initialized
at com.stronglink.slkcli.queue.ArchiveWebsocketWorker.closeConnection(ArchiveWebsocketWorker.kt:146)
at com.stronglink.slkcli.queue.WebsocketWorker.run(WebsocketWorker.kt:67)
Error message in the log:
2022-03-01 13:50:28 ERROR Error in websocket worker
java.util.concurrent.CompletionException: java.net.http.WebSocketHandshakeException
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) ~[?:?]
Reason¶
Probably, slk archive
was run with --streams 10
or a similar high number like --streams 16
or --streams 32
Solution¶
Please use slk archive --streams N
with a maximum value of 4
for N
. Transfer rates of 1 to 2 GB/s are possible with this configuration when the system is not busy.
slk archive runs infinitely when files of 0 byte size are archived¶
If we archive files with slk archive
and at least one file has a size of 0 Byte then slk archive
will archive all files but will not quit. Instead, it will run infinitely. When run in a batch job, it is not possible to determine whether slk archive
is still archiving or whether it is hanging due to a 0 Byte file.
slk delete failed, but nevertheless file was deleted¶
Issue description¶
We run slk delete /abc/def/ghi.txt
but slk delete fails due to an unknown reason. Repeated calls of slk delete /abc/def/ghi.txt
fail because the target file does not exist anymore.
Reason¶
The reason has not been fully identified yet. This one is the most probable reason: When slk delete
sends a deletion request to StrongLink, it waits a certain time for the response of the StrongLink instance to return. If this does not happen or if the reply of another confirmation step does not return in time (= timeout), slk
assumes that the command failed.
Solution¶
Please carefully check, if files were actually deleted when a slk delete
did finish successfully.