Reference: StrongLink verify jobs#
file version: 17 Sept 2024
current software version: slk_helpers version 1.12.10
Introduction#
StrongLink allows to run so called verify jobs which check the integrity of files based on their size. A verify job is started via
slk_helpers submit_verify_job ...
StrongLink stores the expected file size in the beginning of an archival process. A verify job compares the expected file size against the actual size of a file. Checksums or other parameters are not considered in this process. Only files which pass this check are written to tape. Therefore, only cached files can be verified.
The verify jobs run in the scheduling system as retrieval/recall jobs do. This means that, if there are many recall jobs waiting in the queue, new verify jobs are put in the end of the queue and need to wait. You can check this via
slk_helpers job_status <job_id>
Please do not submit more than ten recall jobs at once.
When a verify job is finished, you can fetch the results from StrongLink via
slk_helpers result_verify
If you wish that all your newly archived files are automatically verified on weekly basis, please run
slk_wrapper_weekly_verify_job
This starts a SLURM job which starts a verify job and notifies you via email when a file verification fails.
Run time#
A verify job targeting a few thousand of files runs approximately one minute. If you provide more than 50 000 files, multiple verify jobs are submitted with each up to 50 000 files as target. Run times which we experianced for larger file numbers are:
10 000 files: 2.5 minutes
50 000 files: 6 minutes
Submitting a verify job#
When you submit a verify job you receive a job id via which you can check the job status and fetch a verify report.
Submit a verify job for a namespace#
$ slk_helpers submit_verify_job /dkrz_test/netcdf/20240116c -R
Submitting up to 1 verify job(s) based on results of search id 704756:
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236962
Number of submitted verify jobs: 1
Please remember the verify job id 236962
.
Submit a verify job for a list of files#
$ slk_helpers submit_verify_job /dkrz_test/netcdf/20240116c/test_netcdf_a.nc /dkrz_test/netcdf/20240116c/test_netcdf_b.nc
Submitting up to 1 verify job(s) based on results of search id 704757:
search results: pages 1 to 1 of 1; visible search results: 2; submitted verify job: 236963
Number of submitted verify jobs: 1
Please remember the verify job id 236963
.
Both commands might take a while because a search is performed in the background which is slow in some situations. You can add the parameter -v
for verbose output so that you see what the command is doing.
Submit verify job for a files matching a RegEx#
Submit a verify job for all *.nc
files in a namespace using a search and search id. StrongLink interpretes Regular Expressions in the filenames. It does not interprete Regular Expressions in folder names. It does not interprete bash globs / wildcards. Therefore, the wildcard *
equals the regular expression .*
.
$ slk_helpers gen_file_query -R /dkrz_test/netcdf/20240116a/.*nc --cached-only
{"$and":[{"$and":[{"path":{"$gte":"/dkrz_test/netcdf/20240116a"}},{"resources.name":{"$regex":".*nc"}}]},{"smart_pool":"slpstor"}]}
$ slk search '{"$and":[{"$and":[{"path":{"$gte":"/dkrz_test/netcdf/20240116a"}},{"resources.name":{"$regex":".*nc"}}]},{"smart_pool":"slpstor"}]}'
Search continuing... ...
Search ID: 704758
$ slk_helpers submit_verify_job --search-id 704758
Submitting up to 1 verify job(s) based on results of search id 704758:
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236965
Number of submitted verify jobs: 1
OR simply
$ slk_helpers submit_verify_job /dkrz_test/netcdf/20240116a/.*nc -R
Submitting up to 1 verify job(s) based on results of search id 704758:
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236965
Number of submitted verify jobs: 1
Both commands might take a while because a search is performed in the background which is slow in some situations. You can add the parameter -v
for verbose output so that you see what the command is doing.
$ slk_helpers submit_verify_job /dkrz_test/netcdf/20240116a/.*nc -R -v
Generating search query.
Search query is: '{"$and":[{"$and":[{"path":{"$gte":"/dkrz_test/netcdf/20240116a"}},{"resources.name":{"$regex":".*nc"}}]},{"smart_pool":"slpstor"}]}'.
Starting search query.
Search ID is: 704758
Search continuing. ........
Starting on page 1/1 and ending on page 1/1 of the search results of search 704758 (1000 search results per page).
Submitting up to 1 verify job(s) based on results of search id 704758:
Collecting search results from page 1 to page 1
Collecting search results 1 to 1000
Collected 8 search results from page 1 to page 1
Generate verify query
Submit verify query
search results: pages 1 to 1 of 1; visible search results: 8; submitted verify job: 236965
Number of submitted verify jobs: 1
Check the status of a verify job#
The results of a verify job are provided as a verify report. A verify report should not be fetch before the verify job is finished. slk_helpers job_status
can be used to check the processing stated of a verify job. You can get the report if the state is COMPLETED
or SUCCESSFUL
. Otherwise, the report might not exist (e.g. if in state QUEUED
) or might be incomplete (e.g. if in state PROCESSING
).
$ slk_helpers job_status 156548
COMPLETED
$ slk_helpers job_status 157303
QUEUED (21)
Getting results of a verify job#
Please run this command to fetch the results of the verify job
$ slk_helpers result_verify_job 156548
Errors:
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_b.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_c.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_a.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_f.nc
Erroneous files: 4
Four size-mismatch errors were detected. The this case, these files should be re-archived or deleted from the archive.
Getting raw results of a verify job#
When the verify job is completed, the report can be fetched via slk_helpers verify_report
. It can be written into a file …
$ slk_helpers job_report 156548 --outfile verify_report_job_156548.txt
… or printed to the terminal
$ slk_helpers job_report 156548
StrongLink_Version,UNKNOWN
job_id,156548
job_name,slk_helpers Verify Job by user k204221
policy_type,VERIFY
job_status,COMPLETED
source_namespace,/dkrz_test/netcdf/20230925a
destination_pools,[37]
quick_check,true
update_deleted_files,false
email_job_log,false
job_start,Wed Oct 04 21:13:09 UTC 2023
job_end,Wed Oct 04 21:14:33 UTC 2023
scan_start,Wed Oct 04 20:56:18 UTC 2023
scan_end,Wed Oct 04 21:13:57 UTC 2023
io_start,Wed Oct 04 20:56:18 UTC 2023
io_end,Wed Oct 04 21:14:33 UTC 2023
scanned_directories,2
scanned_file_size,21104159996
scanned_files,13
scanned_skipped_no_copies,2
io_bytes,18400035132
io_file_not_found,1
io_files,5
io_verify_size_failed,5
create date,status,message,description
Wed Oct 04 21:14:33 UTC 2023,INFO,IO completed,The IO for job 156548 has completed
Wed Oct 04 21:13:57 UTC 2023,INFO,scanner completed,The scanner for job 156548 has completed
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_b.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_a.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_c.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_f.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: File not found,File not found: /dkrz_test/netcdf/20230925a/file_001gb_g.nc
Wed Oct 04 21:13:11 UTC 2023,ERROR,File verification failed: Resource content size does not match record,Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_d.nc
Wed Oct 04 21:13:10 UTC 2023,INFO,Skipped Resource without RCR,Resource ID: 74481521010 / path: /dkrz_test/netcdf/20230925a/file_001gb_h.nc
Wed Oct 04 21:13:10 UTC 2023,INFO,Skipped Resource without RCR,Resource ID: 74481678010 / path: /dkrz_test/netcdf/20230925a/empty_file.txt
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 21:13:10 UTC 2023,INFO,GNS Scan Started,GNS Scan for job 156548 is starting
Wed Oct 04 20:56:18 UTC 2023,INFO,New job starting,Job 156548 is starting
Evaluating a verification report#
An example verification report is printed in the previous section.
The lines below create date,status,message,description
are of interest. This is csv and you can load it into the spreadsheet software of your choice or as Python
Pandas.DataFrame
. Below we explain what certian status,message
combinations mean and what should be done:
ERROR,File verification failed: Resource not stored in pool(s): [37]
: please ignore this error; it indicates that the file has already been written to tape and deleted from the HSM cacheERROR,File verification failed: Resource content size does not match record
: please archive the file againERROR,File verification failed: File not found
: directly contact support@dkrz.deINFO,Skipped Resource without RCR
: printed in different situations; please check which of the following cases appliesnothing to do if the target file has 0 byte size and this intended
please archive the file again if the target file has 0 byte size but should be larger
directly contact support@dkrz.de if a size greate 0 byte is printed
INFO,Skipped Soft-Deleted Resource
: please ignore this information; the target file has been marked for deletion (deleted from user perspective) but has not been cleaned up yet