Reference: metadata schemata#
file version: 08 Dec 2023
Introduction#
Files can be searched, found and retrieved based on their metadata. The metadata are stored in metadata fields. Each metadata field is part of one metadata schema. A search is defined by a JSON-formatted search query (JSON = JavaScript Object Notation) that is submitted to StrongLink via slk search
. A search operation is a key-value pair surrounded by {}
(e.g. '{"key": "value"}'
). The key
can be a metadata field (e.g. resources.name
or document.Author
) or an operator (e.g. $gt
(>
) or $and
(logical and
)).
See also
Details on search queries and operators are provided on Reference: StrongLink query language. Several example search queries are provided in slk Usage Examples. Please see the page Metadata for a general overview.
Basic file metadata, e.g. owner and size, are automatically extracted from any archived file and stored in the schema resources
. Depending on the file type, additional file-type-specific metadata are automatically extracted, too. At the moment, this feature is enabled for files of type netCDF, with the corresponding schemata netcdf
and netcdf_header
detailed below. All metadata stored in the netcdf
and netcdf_header
metadata schemata can be manually modified by the user via slk tag
(see Set metadata). All global attributes and their values of a netCDF file are stored read-only in the field netcdf.Data
.
A resource might be associated to more than one metadata schema - e.g. document
, example_schema_abc
and example_schema_xyz
. One metadata field might exist multiple times amongst several metadata schemata - e.g. document.Author
, example_schema_abc.Author
and example_schema_xyz.Author
. A file associated to these three exemplary metadata schemata might have three different values for *.Author
- e.g.: document.Author: "Max Mustermann"
, example_schema_abc.Author: "Maxima Musterfrau"
and example_schema_xyz.Author: "Mr. and Mrs. Muster"
. Thus, if netcdf.Title
is modified manually, netcdf_header.Title
should be modified manually as well.
Warning
The names of metadata fields are case sensitive.
Basic StrongLink metadata schema#
Each file and namespace in StrongLink has basic metadata comparable to POSIX metadata. These metadata are listed in the table below.
Name |
Type |
Description |
---|---|---|
path |
String |
The namespace path in which the file exists |
resources._id |
Int |
The ID of the resource (database record for the file) |
resources.birth_time |
date |
The date and time the file was initially created (does not work) |
resources.created |
date |
The date and time the resource was created in StrongLink |
resources.mimetype |
String |
The MIME type of the file |
resources.mtime |
date |
The date and time the resource was last modified in StrongLink |
resources.name |
String |
The name of the file |
resources.parent_id |
Int |
The ID of the namespace in which the file is stored |
resources.posix_gid |
Int |
The POSIX GID for the file |
resources.posix_mode |
Int |
The POSIX mode for the file |
resources.posix_uid |
Int |
The POSIX UID for the file |
resources.rcr_mtime |
date |
The date and time the file’s contents were last modified |
resources.size |
Int |
The size of the file |
resources.version |
Int |
The version of the files |
smart_pool |
String |
Set to |
Note
StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.
Note
Using path without the operator $gte
does not work. Also, $gte
is the only operator that can be used to search in a path
. By default, $gte
does a recursive search through all subfolders. There is a second operator $max_depth
, which limits the recursion depth (depth of the subfolder tree) and which can only be used in this context. "$max_depth": 1
means non-recursive search. An example is given below.
# does NOT work to search for all netCDF files (*.nc) in /arch/PROJ/USER/test
slk search '{"$and": [{"path": "/arch/PROJ/USER/test"}, {"resources.name": {"$regex": "nc$"}}]}'
# search for all netCDF files (*.nc) in /arch/PROJ/USER/test and its subfolders
slk search '{"$and": [{"path": {"$gte": "/arch/PROJ/USER/test"}}, {"resources.name": {"$regex": "nc$"}}]}'
# search for all netCDF files in /arch/PROJ/USER/test but not in its subfolders
slk search '{"$and": [{"path": {"$gte": "/arch/PROJ/USER/test", "$max_depth": 1}}, {"resources.name": {"$regex": "nc$"}}]}'
# if we list the search results of the second search, it will be equal to
slk list /arch/PROJ/USER/test/*.nc | cat
schema: netcdf#
This schema contains core metadata of each archived netCDF file and summarizes the most relevant information of the file for human viewers. Please see the table below for available metadata fields. The column Source
indicates where the information comes from (details next paragraphs). Except for the field netcdf.Errata
, all fields are be filled automatically if the corresponding attributes are used in the netCDF file. This will not update the archived netCDF file. The content of all metadata fields can edited manually by slk tag
and slk_helpers json2hsm
.
Details:
Most metadata fields are filled with the content of global attributes. These are indicated by
:
as first character in the columnSource
of the table below. Some metadata fields are mapped to more than one global attribute. If(concatenate)
is written in the end of columnSource
, the content of these attributes (when available) is concatenated to a comma-separated list. If(concatenate)
is not present, the most left-listed global attribute, which is available in the NetCDF file, will be used as source for the respective metadata field.netcdf.Time_Min
andnetcdf.Time_Max
are extracted from thetime
variable. Currently, only a variable with the nametime
is recognized astime
variable.netcdf.Var_Long_Name
andnetcdf.Var_Std_Name
contain a comma-separated list of the values of the variable attributeslong_name
andstandard_name
, respectively. Coordinate and auxiliary coordinate variables are also considered.netcdf.Var_Name
is a comma-separated list of all variables in the netCDF file.
Files with these MIME-types are considered as “netcdf”:
x-netcdf
x-hdf (if netCDF-4 format)
schema and metadata import as it should work#
Name |
Type |
Description |
Source |
---|---|---|---|
netcdf.Creation_Date |
string |
Creation Date |
:creation_date, :date_created |
netcdf.Pid |
string |
ID of Data Set |
:doi, :id, :tracking_id |
netcdf.License |
string |
License |
:license, :licence |
netcdf.Creator |
string |
Creator |
:creator, :originator, :creator_name, :creator_email, :creator_url, :creator_type, :creator_institution (concatenate) |
netcdf.Project |
string |
Project Identifier |
:project, :mip_era, :project_id |
netcdf.Institution |
string |
Institution |
:institution, :institute |
netcdf.Institution_Id |
string |
Institution Identifier |
:institution_id, :institute_id, :centre, :center |
netcdf.Source |
string |
Source |
:source, :model |
netcdf.Realm |
string |
Realm |
:realm, :model_realm |
netcdf.Experiment_Id |
string |
Experiment Identifier |
:experiment_id, :expid, :exp_id |
netcdf.External_Description |
string |
External_Description |
:metadata_link, :further_info_url |
netcdf.Contact |
string |
Contact |
:contact, :contact_email (concatenate) |
netcdf.Errata |
string |
Errata |
|
netcdf.Model_Git_Hash |
string |
Model Git Hash |
:git_hash, :model_version, :hash |
netcdf.Title |
string |
Title of Data Set |
:title, :titel |
netcdf.Var_Long_Name |
string |
Var Long Name |
concatenation of the |
netcdf.Var_Name |
string |
Var Name |
:concatenation of all variable names (comma separated) |
netcdf.Var_Std_Name |
string |
Var Std Name |
concatenation of the |
netcdf.Time_Min |
date |
Time Min |
first value of |
netcdf.Time_Max |
date |
Time Max |
last value of |
netcdf.Type |
string |
Type |
:type, :dataType |
netcdf.Class |
string |
Class |
:class |
netcdf.Data |
special |
all global attributes |
:* |
schema: netcdf_header#
This schema contains more than 100 metadata fields, should cover most common netCDF metadata and is meant to allow for automated evaluation. All global attributes that have the same name as the metadata field will be automatically ingested. The mapping is case-insensitive. Please see the table below for available metadata fields. All of these metadata fields can also be filled manually by slk tag
and slk_helpers json2hsm
. This will not update the archived netCDF file. In the background, all global attributes are stored but cannot be searched by users.
If metadata fields, which are important for your use case, are missing, please contact us. We will collect these proposals for the next revision of the metadata schema.
Files with these MIME-types are considered as “netcdf_header”:
x-netcdf
x-hdf (if netCDF-4 format)
schema and metadata import as it should work#
Name |
Type |
Description |
---|---|---|
netcdf_header.Sub_Experiment_Id |
string |
Sub Experiment Id |
netcdf_header.Summary |
string |
Summary |
netcdf_header.Table_Id |
string |
Table Id |
netcdf_header.Target_Mip |
string |
Target Mip |
netcdf_header.Time_Coverage_Duration |
string |
Time Coverage Duration |
netcdf_header.Time_Coverage_End |
string |
Time Coverage End |
netcdf_header.Time_Coverage_Resolution |
string |
Time Coverage Resolution |
netcdf_header.Time_Coverage_Start |
string |
Time Coverage Start |
netcdf_header.Time_Min |
date |
Time Min |
netcdf_header.Time_Max |
date |
Time Max |
netcdf_header.Data_Specs_Version |
string |
Data Specs Version |
netcdf_header.Title |
string |
Title |
netcdf_header.Dataset_Category |
string |
Dataset Category |
netcdf_header.Tracking_Id |
string |
Tracking Id |
netcdf_header.Dataset_Version_Number |
string |
Dataset Version Number |
netcdf_header.Date_Created |
string |
Date Created |
netcdf_header.Date_Issued |
string |
Date Issued |
netcdf_header.Date_Metadata_Modified |
string |
Date Metadata Modified |
netcdf_header.Date_Modified |
string |
Date Modified |
netcdf_header.Doi |
string |
Doi |
netcdf_header.Experiment |
string |
Experiment |
netcdf_header.Experiment_Id |
string |
Experiment Id |
netcdf_header.Activity_Id |
string |
Activity Id |
netcdf_header.Cdm_Data_Type |
string |
Cdm Data Type |
netcdf_header.Channel_File_Type |
string |
Channel File Type |
netcdf_header.Comment |
string |
Comment |
netcdf_header.Contributor_Name |
string |
Contributor Name |
netcdf_header.Conventions |
string |
Conventions |
netcdf_header.Creation_Date |
string |
Creation Date |
netcdf_header.Creator_Institution |
string |
Creator Institution |
netcdf_header.Creator_Name |
string |
Creator Name |
netcdf_header.Featuretype |
string |
Featuretype |
netcdf_header.Forcing_Index |
string |
Forcing Index |
netcdf_header.Frequency |
string |
Frequency |
netcdf_header.Further_Info_Url |
string |
Further Info Url |
netcdf_header.Gcm |
string |
Gcm |
netcdf_header.Gcm_Horizontal_Mode |
string |
Gcm Horizontal Mode |
netcdf_header.Gcm_Start_Date_Time |
string |
Gcm Start Date Time |
netcdf_header.Gcm_Timestep |
string |
Gcm Timestep |
netcdf_header.Gcm_Vertical_Mode |
string |
Gcm Vertical Mode |
netcdf_header.Gdnam |
string |
Gdnam |
netcdf_header.Geospatial_Bounds |
string |
Geospatial Bounds |
netcdf_header.Geospatial_Lat_Max |
string |
Geospatial Lat Max |
netcdf_header.Geospatial_Lat_Min |
string |
Geospatial Lat Min |
netcdf_header.Geospatial_Lat_Resolution |
string |
Geospatial Lat Resolution |
netcdf_header.Geospatial_Lat_Units |
string |
Geospatial Lat Units |
netcdf_header.Geospatial_Lon_Max |
string |
Geospatial Lon Max |
netcdf_header.Geospatial_Lon_Min |
string |
Geospatial Lon Min |
netcdf_header.Geospatial_Lon_Resolution |
string |
Geospatial Lon Resolution |
netcdf_header.Geospatial_Lon_Units |
string |
Geospatial Lon Units |
netcdf_header.Geospatial_Vertical_Max |
string |
Geospatial Vertical Max |
netcdf_header.Geospatial_Vertical_Min |
string |
Geospatial Vertical Min |
netcdf_header.Geospatial_Vertical_Positive |
string |
Geospatial Vertical Positive |
netcdf_header.Geospatial_Vertical_Resolution |
string |
Geospatial Vertical Resolution |
netcdf_header.Geospatial_Vertical_Units |
string |
Geospatial Vertical Units |
netcdf_header.Grid |
string |
Grid |
netcdf_header.Grid_Label |
string |
Grid Label |
netcdf_header.Grid_Resolution |
string |
Grid Resolution |
netcdf_header.Id |
string |
Id |
netcdf_header.Initialization_Index |
string |
Initialization Index |
netcdf_header.Institution |
string |
Institution |
netcdf_header.Institution_Id |
string |
Institution Id |
netcdf_header.Instrument |
string |
Instrument |
netcdf_header.Keywords |
string |
Keywords |
netcdf_header.Lat_Min |
string |
Lat Min |
netcdf_header.Lat_Max |
string |
Lat Max |
netcdf_header.Level_Min |
string |
Level Min |
netcdf_header.Level_Max |
string |
Level Max |
netcdf_header.License |
string |
License |
netcdf_header.Lon_Min |
string |
Lon Min |
netcdf_header.Lon_Max |
string |
Lon Max |
netcdf_header.Metadata_Link |
string |
Metadata Link |
netcdf_header.Mip_Era |
string |
Mip Era |
netcdf_header.Naming_Authority |
string |
Naming Authority |
netcdf_header.Nominal_Resolution |
string |
Nominal Resolution |
netcdf_header.Number_Of_Grid_Used |
string |
Number Of Grid Used |
netcdf_header.Parent_Activity_Id |
string |
Parent Activity Id |
netcdf_header.Parent_Experiment_Id |
string |
Parent Experiment Id |
netcdf_header.Parent_Mip_Era |
string |
Parent Mip Era |
netcdf_header.Parent_Source_Id |
string |
Parent Source Id |
netcdf_header.Parent_Time_Units |
string |
Parent Time Units |
netcdf_header.Parent_Variant_Label |
string |
Parent Variant Label |
netcdf_header.Physics_Index |
string |
Physics Index |
netcdf_header.Pid |
string |
Pid |
netcdf_header.Platform |
string |
Platform |
netcdf_header.Processing_Level |
string |
Processing Level |
netcdf_header.Product |
string |
Product |
netcdf_header.Product_Version |
string |
Product Version |
netcdf_header.Program |
string |
Program |
netcdf_header.Project |
string |
Project |
netcdf_header.Project_Id |
string |
Project Id |
netcdf_header.Tstep |
string |
Tstep |
netcdf_header.User_Name |
string |
User Name |
netcdf_header.Var-List |
string |
Var-List |
netcdf_header.Var_Long_Name |
string |
Var Long Name |
netcdf_header.Var_Name |
string |
Var Name |
netcdf_header.Var_Std_Name |
string |
Var Std Name |
netcdf_header.Variable_Id |
string |
Variable Id |
netcdf_header.Realization_Index |
string |
Realization Index |
netcdf_header.Variant_Label |
string |
Variant Label |
netcdf_header.Realm |
string |
Realm |
netcdf_header.Version |
string |
Version |
netcdf_header.References |
string |
References |
netcdf_header.Sdate |
string |
Sdate |
netcdf_header.Source |
string |
Source |
netcdf_header.Source_Id |
string |
Source Id |
netcdf_header.Source_Type |
string |
Source Type |
netcdf_header.Sub_Experiment |
string |
Sub Experiment |
schema: document#
Files with these MIME-types are consideres as “document”:
msword
pdf
vnd.ms-excel
vnd.ms-office
vnd.ms-powerpoint
vnd.openxmlformats-officedocumen.presentationml.presentation
vnd.openxmlformats-officedocumen.spreadsheetml.sheet
vnd.openxmlformatsofficedocument.wordprocessingml.document
Name |
Type |
Description |
---|---|---|
document.Author |
String |
An entity primarily responsible for creating the content of the resource |
document.Title |
String |
Name or other identifier (such as email address) of person who created the document |
document.Content creator |
String |
document’s creator: this could be the name of the application (e.g. OpenOffice) that created the original document |
document.Version |
String |
Free-form version |
document.Language |
String |
Language the document is written in |
document.Last modified by |
String |
Name or other identifier (such as email address) of person who last modified the document |
document.Revision |
Int |
Document revision number |
document.Pages |
Int |
Number of pages |
document.Paragraphs |
Int |
Number of paragraphs |
document.Words |
Int |
Number of words |
document.Characters |
Int |
Number of characters |
document.Keywords |
String |
Keywords |
document.Subject |
String |
Subject |
document.Creation Date |
String |
Creation Date |
schema: image#
Files with these MIME-types are consideres as “image”:
application/gif
application/jpeg
application/png
application/tiff
application/x-ms-bmp
application/x-pcx
application/x-pcxvnd.adobe.photoshop
Name |
Type |
Description |
---|---|---|
image.Width |
int |
Image’s width in pixels |
image.Height |
int |
Image’s height in pixels |
image.Orientation |
String |
Orientation of the image (e.g. Landscape) |
image.Compression |
String |
Compression format of the image |
image.Bits/pixel |
int |
Number of bits per pixel |
image.Pixel format |
String |
Color pixel format of the image |
image.Format version |
String |
Image format version |
image.Producer |
String |
Image producer |
image.Thumbnail size |
int |
Thumbnail size |
image.Compress bits/pixel |
Decimal |
Compressed bits per pixel |
image.Depth |
Int |
Image’s depth in pixels |
schema: video#
Files with these MIME-types are consideres as “video”:
mp4
quicktime
x-flv
x-matroska
x-ms-asf
x-msvideo
Name |
Type |
Description |
---|---|---|
video.Width |
int |
Video’s width in pixels |
video.Height |
Int |
Video’s height in pixels |
video.Duration |
String |
Video’s length in hours, minutes and seconds |
video.Producer |
String |
Video’s content producer |
video.Compression |
String |
Compression format of the video |
schema: audio#
Files with these MIME-types are consideres as “audio”:
basic
flac
mid
ogg
x-aiff
x-pn-realaudio
x-wav
Name |
Type |
Description |
---|---|---|
audio.Duration |
String |
Audio’s length in hours, minutes and seconds |
audio.Language |
String |
Language the audio is in |
audio.Channels |
Int |
Number of channels |
audio.Sample rate |
Int |
Sample rate |
audio.Compression |
String |
Compression format of the audio |
audio.Format version |
String |
Format version |
audio.Bit rate |
Int |
Bit rate (bits per second) |
audio.Bits/sample |
Int |
Bits per sample |
audio.Compression rate |
Decimal |
Compression rate |
schema: camera#
Files with these MIME-types are consideres as “camera”:
jpeg
Name |
Type |
Description |
---|---|---|
camera.Camera aperture |
Decimal |
Camera aperture in decimals |
camera.Camera focal |
Decimal |
Camera aperture in decimals |
camera.Camera exposure |
Decimal |
Camera exposure in decimals |
camera.Model |
String |
Camera model |
camera.Manufacturer |
String |
Camera manufacturer |
camera.Shutter speed |
Decimal |
Length of time when the film or camera sensor is exposed to light |
camera.Aperture |
Decimal |
Length of aperture |
camera.Exposure bias |
Decimal |
Camera exposure bias |
camera.Focal length |
Decimal |
Camera focal length |
camera.Camera brightness |
Decimal |
Camera brightness |
camera.ISO speed |
Int |
Camera ISO speed |
camera.Binning |
Int |
Camera Binning |