Reference: StrongLink query language#
file version: 08 Dec 2023
Introduction#
Searches are defined via JSON-formatted search queries and are performed via slk search
. Queries consist of operators (e.g. $and
, $gt
, $regex
), metadata fields and values. Please see Reference: metadata schemata for metadata fields that can be used in search queries. The query structure, available operators and example queries are on this page. Several example search queries are provided in slk Usage Examples. Please see the page Metadata for a general overview.
Query structure#
Search queries in StrongLink have to be compiled using a special query language. This query language has been developed by StrongLink. The syntax of StrongLink search queries is written in the JSON data interchange format (JSON = JavaScript Object Notation).
A search operation is a key-value pair surrounded by {}
(e.g. '{"key": "value"}'
). The key
can be a metadata field (e.g. resources.name
or document.Author
) or an operator (e.g. $gt
(>
) or $and
(logical and
)). All available operators are listed in table Query Operators. The value
might be a value of a common data type (string
, integer
, floating point number
, date
, boolean
), another search operation or a list of these. Hereby, different search operations can be nested.
Commonly, search queries have these structures:
search_query = {metadata_field: value}
search_query = {metadata_field: {operator: value}}
search_query = {operator_linking_several_queries: [search_query_1, search_query_2, ...]}
Query operators#
This table is copied from the StrongLink Command Line Interface Guide.
Operator |
Description |
---|---|
|
Projects the first element in an array that matches the query. |
|
Finds arrays that contain all elements specified in the query. |
|
Joins query clauses with a logical AND; finds resources that match the conditions of both clauses. |
|
Finds numeric or binary values in which a set of bit positions all have a value of 0. |
|
Finds numeric or binary values in which a set of bit positions all have a value of 1. |
|
Finds numeric or binary values in which any bit from a set of bit positions has a value of 0. |
|
Finds numeric or binary values in which any bit from a set of bit positions has a value of 1. |
|
Adds a comment to a query. |
|
Projects the first element in an array that matches the specified $elemMatch condition. |
|
Finds values that are equal to a specified value. |
|
Finds resources that have the specified field. |
|
Allows use of aggregation expressions in the query language. |
|
Selects geometries that intersect with a GeoJSON geometry. |
|
Selects geometries in a bounding GeoJSON geometry. |
|
Finds values that are greater than a specified value. |
|
Finds values that are greater than or equal to a specified value. For usage with |
|
Finds any of the values specified in an array. |
|
Validates documents against a specified JSON schema. |
|
Finds values that are less than a specified value. |
|
Finds values that are less than or equal to a specified value. |
|
To be used in combination with |
|
Projects the document’s score assigned during $text operation. |
|
Performs a modulo operation on the value of a field and selects documents with a specified result. |
|
Finds all values that are not equal to a specified value. |
|
Returns geospatial objects in proximity to a point. Requires a geospatial index. |
|
Returns geospatial objects in proximity to a point on a sphere. Requires a geospatial index. |
|
Finds none of the values specified in an array. |
|
Joins query clauses with a logical NOR; finds resources that fail to match both clauses. |
|
Finds resources that do not match the query expression. |
|
Joins query clauses with a logical OR; finds resources that match the conditions of either clause. |
|
Selects documents where values match a specified regular expression. |
|
Finds resources if an array field is a specified size. |
|
Limits the number of elements projected from an array. Supports skip and limit slices. |
|
Performs text search. |
|
Finds resources if a field is of the specified type. |
|
Finds resources that satisfy a JavaScript expression. |
Note
The $gte
is the only operator that can be used to search in a path
. By default, $gte
does a recursive search through all subfolders. There is a second operator $max_depth
, which limits the recursion depth (depth of the subfolder tree) and which can only be used in this context. "$max_depth": 1
means non-recursive search. An example is given below.
Species cases#
path metadata field#
The path
metadata field has to be used in combination with the $gte
operator.
{"path": {"$gte": "PATH"}}
The search in PATH
is by default a recursive search. If it should be a non-recursive search, the operator $max_depth
has to be used as follows:
{"path": {"$gte": "PATH", "$max_depth": 1}}
$and, $or and $nor#
If one of the operators $and
, $or
and $nor
is used as key
, then the value
needs to be a list of queries:
slk search '{$and: [query1, query2, ...]}'
These are a few usage examples
# search for netCDF files with the global attribute :title set to "great dataset" or "ultimate dataset":
slk search '{"$or": [{"netcdf.Title": "great dataset"}, {"netcdf.Title": "ultimate dataset"}]}'
# alternatively, we might substitute the "$or" as follows:
slk search '{"netcdf.Title": {"$in": ["great dataset", "ultimate dataset"]}}'
# search for official CMIP6 netCDF files in monthly time ferquency and with a variable named ``pr`` or ``tas``
# note: we need the "$regex" operator because there is no other operator to check whether a string contains a sub-string;
# "netcdf.Var_Name" is a string which contains a comma-separated list of all variables in the particular file
slk search '{"$and" [{"netcdf.Project": "CMIP6"}, {"netcdf_header.Frequency": "mon"}, {"$or": [{"netcdf.Var_Name": {"regex": "pr"}}, {"netcdf.Var_Name": {"regex": "tas"}}]}]}'
# instead of the "$or" we can also write:
slk search '{"$and" [{"netcdf.Project": "CMIP6"}, {"netcdf_header.Frequency": "mon"}, {"netcdf.Var_Name": {"regex": "pr|tar"}}]}'
searching for files in a certain namespace#
# search for all netCDF files (*.nc) in /arch/PROJ/USER/test and its subfolders
slk search '{"$and": [{"path": {"$gte": "/arch/PROJ/USER/test"}}, {"resources.name": {"$regex": "nc$"}}]}'
# search for all netCDF files in /arch/PROJ/USER/test but not in its subfolders
slk search '{"$and": [{"path": {"$gte": "/arch/PROJ/USER/test", "$max_depth": 1}}, {"resources.name": {"$regex": "nc$"}}]}'
# if we list the search results of the second search, it will be equal to
slk list /arch/PROJ/USER/test/*.nc | cat
Generate RQL Queries#
We provide two slk_helpers
commands which generate RQL query strings for you. You can use the generated strings directly for searches or as a basis to construct more advanced search queries. The commands are:
slk_helpers gen_file_query
: accepts a list of files and/or namespaces and constructs a search for/within themslk_helpers gen_search_query
: accepts key values pairs for the form<metadata_field>=<value_to_search_for>
as conditions and constructs a search query string
Please find details and help in the section Generate search queries of the usage examples page or on the page File Search and Metadata.