Metadata and File Search¶
file version: 04 April 2022
Metadata schema in StrongLink¶
Files can be searched, found and retrieved based on their metadata. Metadata are stored in metadata fields, e.g. title
. Each metadata field is part of one metadata schema. Several metadata schemata might have fields with the same name but with different content. Basic file metadata, e.g. owner and size, are automatically extracted from any archived file and stored in the schema resources
. Depending on the file type, additional file-type-specific metadata are automatically extracted. At the moment, this feature is enabled for files of type NetCDF, with the corresponding schemata netcdf
and netcdf_header
detailed below. All available metadata schemata, their content and file types on which they are applied are listed in our Metadata schema reference.
Searches are defined via JSON-formatted search queries and are performed via slk search
and slk_helpers search_limited
. Currently, slk search
is deactivated due to an internal technical issue. Please only use slk_helpers search_limited
for now. We will inform the DKRZ users when slk search
is activated. Operators are used to define metadata queries in the system in order to find specific data. Details on search queries and operators are provided on Reference: StrongLink query language. Several example search queries are provided in slk Usage Examples.
Note
A file or namespace might not only be associated to one metadata schema but to an arbitrary number of metadata schemata - e.g. document
, example_schema_abc
and example_schema_xyz
. One metadata field might exist multiple times amongst several metadata schemata - e.g. document.Author
, example_schema_abc.Author
and example_schema_xyz.Author
. A file associated to these three examplary metadata schemata might have three different values for *.Author
- e.g.: document.Author: "Max Mustermann"
, example_schema_abc.Author: "Maxima Musterfrau"
and example_schema_xyz.Author: "Mr. and Mrs. Muster"
.
Set metadata¶
Users can modify the content of all metadata fileds that are part of an extended metadata schema. This is done via slk tag
.
Examples:
slk tag /tape/arch/bm0146/k204221/test_files document.Author="Mustermann, Max"
Note
A part of the netCDF metadata (mainly global attributes) is copied into an extended metadata schema. The full metadata extracted from netCDF files will be stored in a special format and will be read-only. Modifying the metadata in the extended netCDF metadata schema will not modify the read-only metadata.
Currently, individual files cannot be specified in slk tag
. However, a search (see above) can be defined and the search id can be used as input for slk tag
. Please have a look into the slk Usage Examples for detailed examples.
Print metadata¶
The slk
command meant for this purpose is not available yet.
Search files by metadata (deactivated)¶
Note
Currently, slk search
is not available due to an internal technical issue. Please use slk_helpers search_limited
instead until slk search
become fully available. We will inform the DKRZ users accordingly.
The command slk search
allows to search for files by their metadata. Users can either search for file name, user name and group name via simple flags or formulate complex search queries on all available metadata fields. Search queries in StrongLink have to be compiled using a special query language whichs structure follows JSON. The output of the search request is a search_ID
. The search_ID
is used as input to slk list
or slk retrieve
in order to print or retrieve the results, respectively.
A few slk search
examples:
# search for "Max" as value in the metadata field "Producer" of the schema "image"
$ slk search {\"image.Producer\":\"Max\"}
Search continuing. .....
Search ID: 9
$ slk list 9
# alternatively, use slk_helpers search_limited
$ slk_helpers search_limited {\"image.Producer\":\"Max\"}
...
Further query examples are given below. Available query operators are given in the Reference: StrongLink query language. See also StrongLink Command Line Interface Guide
from page 6 onwards.
Print a search query in a human-readable way¶
We have got this search query and want to analyze it:
slk_helpers search_limited '{"$and": [{"resources.name": "INDEX.txt"}, {"$or": [{"$and": [{"resources.posix_uid": 25301}, {"path": {"$gte": "/arch"}}]}, {"path": {"$gte": "/double/bm0146"}}]}]}'
The search queries are written in JSON. You can use jq
to print the search queries in a human-readable way:
$ echo '{"$and": [{"resources.name": "INDEX.txt"}, {"$or": [{"$and": [{"resources.posix_uid": 25301}, {"path": {"$gte": "/arch"}}]}, {"path": {"$gte": "/double/bm0146"}}]}]}' | jq
{
"$and": [
{
"resources.name": "INDEX.txt"
},
{
"$or": [
{
"$and": [
{
"resources.posix_uid": 25301
},
{
"path": {
"$gte": "/arch"
}
}
]
},
{
"path": {
"$gte": "/double/bm0146"
}
}
]
}
]
}
Example queries with explanations¶
The examples are partly taken from the StrongLink Command Line Interface Guide
.
Query |
Purpose |
---|---|
|
Find files greater than one megabyte (sizes are in bytes) |
|
Find files in a specific namespace (recursively) |
|
Find files in a specific namespace (non-recursively) |
|
Find files of a specific MIME type |
|
Find files for a specific UID |
|
Find files for a specific GID |
|
Find files modified since a specific date |
|
Find files based on user-defined metadata. The user-defined schema and field name are the field. For example, if querying by the |
|
Find files of user k204221 (who has UID 25301) |
|
Find images which metadata field |
|
Search for all files with the name |
|
Search for all files which names match the regular expression |
|
Find files which either belong user 24855 or user 25301 |
|
Find files with the name |
Advanced query examples¶
# two types of delimiters
$ slk search '{"resources.size":{"$gt": 1048576}}'
$ slk search "{\"resources.size\":{\"\$gt\": 1048576}}"
# using shell variables in calls of slk serach
# ~~~~~~~~~~~~~~~~~~~~ method one ~~~~~~~~~~~~~~~~~~~~
$ id k204221 -u
25301
$ slk search "{\"resources.posix_uid\":25301}"
...
# ~~~~~~~~~~~~~~~~~~~~ method two ~~~~~~~~~~~~~~~~~~~~
$ export uid=`id k204221 -u`
$ slk search "{\"resources.posix_uid\":$uid}"
...
# ~~~~~~~~~~~~~~~~~~~~ method two ~~~~~~~~~~~~~~~~~~~~
$ slk search "{\"resources.posix_uid\":`id k204221 -u`}"
...
Note
The example shell commands are meant for bash
. If you are using csh
or tcsh
then they do not work as printed here but have to be adapted. Please contact DKRZ support (support@dkrz.de) if you require assistance.