logo
  • Documentation
  • Blog
  • News
  • DKRZ Website
  • Getting started at DKRZ
    • Getting a user account
      • DKRZ user account
      • CERA user account
      • ESGF user account
      • Shared user account
    • Resource allocation
    • User Support
      • Help Desk
      • Data Support - Data Management
      • Training
        • Program Analysis and Tools Workshop
        • Training on DKRZ’s HPC systems (mistral, HPSS and cloud)
        • Introduction to DKRZ’s HLRE-3 system Mistral
    • Terms of use
  • Mistral HPC System
    • Configuration
    • Access and Environment
    • File Systems
    • Developing and Using Scientific Software
      • Compiling and Linking
      • Debugging
      • Performance Analysis
        • Profiling with ARM MAP
        • Lightweight MPI analysis
        • Getrusage
        • Intel Tools
        • Score-P, Vampir and Extrae
      • Using Python
    • Running Jobs with Slurm
      • SLURM Introduction
      • Partitions and Limits
      • MPI Runtime Settings
      • Example Batch Scripts
      • Adapting job scripts for mistral phase2
      • Using srun
      • Advanced SLURM Features
      • Accounting and Priorities
      • Job monitoring
    • Data Processing on Mistral
    • Data Transfer
      • uftp
      • GridFTP
    • Containers on Mistral
    • Manuals
    • Known Issues
  • Levante HPC system
    • Configuration
    • Access and Environment
    • File Systems
    • Developing and Using Scientific Software
      • Compiling and Linking
      • Building your own package: spack
    • Running Jobs with Slurm
      • SLURM Introduction
      • Partitions and Limits
      • MPI Runtime Settings
      • Example Batch Scripts
    • Data Processing on Levante
  • Data Storage
    • HPC System Storage
    • HSM Stronglink (tape archive)
      • Getting Started with slk
        • Known Issues (read this!)
        • Retrievals from tape
        • Speedup Retrievals with Striping
        • Metadata and File Search
        • Switching: pftp to slk
      • slk usage examples
      • manual pages slk
        • slk: official StrongLink Command Line Interface
        • slk helpers: slk extension provided by DKRZ
        • Reference: metadata schemata
        • Reference: StrongLink query language
      • External Access
      • FAQ
    • Swift Object Storage
      • Swiftbrowser
      • Python-swiftclient
  • Software & Services
    • Jupyterhub
      • Overview
      • Quick Start
      • Spawner options
      • Kernels
      • Lab Extensions
      • Changelog
    • Gitlab
      • Gitlab-runner
    • Redmine
    • Data Science Services
      • Machine Learning on Mistral
    • ModVis
  • Data Services
    • Data Management Planning
    • Finding & Accessing Data
      • /pool/data user projects
      • ECMWF Reanalysis Products
      • Freva
        • Databrowser
        • ESGF databrowser
        • Crawling data
        • Plugins
        • Analysis history
        • Result browser
      • Do you need Data?
    • Processing & Analysing
    • Publishing & Dissemination
      • Data Preparation and Quality Checking
      • ESGF publication
      • Publication at WDCC
    • Archiving & Preserving
  • Visualization
    • Accessing Mistral’s GPU Nodes
      • VNC from Windows computers without Windows Subsystem for Linux (WSL)
    • ParaView
      • DKRZ ParaView tutorial document
      • The Paraview main screen
      • Paraview Examples
        • Create an image of sea surface speeds
        • Display clouds as a semi-transparent overlay
        • Create a multi-variable image / animation from a coupled ice sheet - climate simulation
        • Prepare a regional land and ocean map background
        • Streamline seeding in vector data
        • Volume rendering of (NARVAL II) ICON data
      • Camera and perspective
        • Set the background (color/gradient)
        • Camera: Follow Path
        • Camera: Orbit
        • Switch to camera parallel projection
        • Camera: Interpolate camera locations
        • Set the view size
      • Colormaps
        • Hack the color map:
        • Chose a different colormap
        • Invert the Colormap
        • Adjust the color bar and its legend
        • Rescale the colormap to a custom range
        • Save the colormap as a preset
        • Adjust the color map for a transparent display of clouds
      • Export
        • Composite animations using mogrify and ffmpeg
        • Export an animation
        • Saving animations and screenshots
      • Filters
        • Extrude a land surface based on topography
        • Apply a calculator to convert u and v into speed
        • Check if you have cell or point data
        • Use the contour filter to display isolines
        • Using an earth texture as background
        • Extrusion of topography and bathymetry
        • Isosurfaces and isocontours in regular lat-lon data
        • Add an image as texture (plane) in the background
        • Add an image as texture on a sphere
        • NetCDF: Date and time annotation
      • Light
        • Deactivate the light kit
      • Readers
        • Load NEMO 3D data with the netCDF CF reader
        • Activate the CDI reader plugin
        • The NetCDF CF reader
        • Load 2D ICON data with the CDI reader
        • Pre-Processing for a multi-variable image / animation
      • Technical
        • Adjust / check the camera controls
        • Open the Paraview settings
        • Make Paraview save a state on quitting or crashing
        • Parallel Visualization with ParaView
        • Automating visualizations in ParaView with PvBatch
        • Save a state file
    • VAPOR
    • NCL
      • NCL examples
    • Python
    • PyNGL
    • GrADS
On this page
  • Metadata schema in StrongLink
  • Set metadata
  • Print metadata
  • Search files by metadata (deactivated)
    • Print a search query in a human-readable way
    • Example queries with explanations
    • Advanced query examples

Metadata and File Search¶

file version: 04 April 2022

Metadata schema in StrongLink¶

Files can be searched, found and retrieved based on their metadata. Metadata are stored in metadata fields, e.g. title. Each metadata field is part of one metadata schema. Several metadata schemata might have fields with the same name but with different content. Basic file metadata, e.g. owner and size, are automatically extracted from any archived file and stored in the schema resources. Depending on the file type, additional file-type-specific metadata are automatically extracted. At the moment, this feature is enabled for files of type NetCDF, with the corresponding schemata netcdf and netcdf_header detailed below. All available metadata schemata, their content and file types on which they are applied are listed in our Metadata schema reference.

Searches are defined via JSON-formatted search queries and are performed via slk search and slk_helpers search_limited. Currently, slk search is deactivated due to an internal technical issue. Please only use slk_helpers search_limited for now. We will inform the DKRZ users when slk search is activated. Operators are used to define metadata queries in the system in order to find specific data. Details on search queries and operators are provided on Reference: StrongLink query language. Several example search queries are provided in slk Usage Examples.

Note

A file or namespace might not only be associated to one metadata schema but to an arbitrary number of metadata schemata - e.g. document, example_schema_abc and example_schema_xyz. One metadata field might exist multiple times amongst several metadata schemata - e.g. document.Author, example_schema_abc.Author and example_schema_xyz.Author. A file associated to these three examplary metadata schemata might have three different values for *.Author - e.g.: document.Author: "Max Mustermann", example_schema_abc.Author: "Maxima Musterfrau" and example_schema_xyz.Author: "Mr. and Mrs. Muster".

Set metadata¶

Users can modify the content of all metadata fileds that are part of an extended metadata schema. This is done via slk tag.

Examples:

slk tag /tape/arch/bm0146/k204221/test_files document.Author="Mustermann, Max"

Note

A part of the netCDF metadata (mainly global attributes) is copied into an extended metadata schema. The full metadata extracted from netCDF files will be stored in a special format and will be read-only. Modifying the metadata in the extended netCDF metadata schema will not modify the read-only metadata.

Currently, individual files cannot be specified in slk tag. However, a search (see above) can be defined and the search id can be used as input for slk tag. Please have a look into the slk Usage Examples for detailed examples.

Print metadata¶

The slk command meant for this purpose is not available yet.

Search files by metadata (deactivated)¶

Note

Currently, slk search is not available due to an internal technical issue. Please use slk_helpers search_limited instead until slk search become fully available. We will inform the DKRZ users accordingly.

The command slk search allows to search for files by their metadata. Users can either search for file name, user name and group name via simple flags or formulate complex search queries on all available metadata fields. Search queries in StrongLink have to be compiled using a special query language whichs structure follows JSON. The output of the search request is a search_ID. The search_ID is used as input to slk list or slk retrieve in order to print or retrieve the results, respectively.

A few slk search examples:

# search for "Max" as value in the metadata field "Producer" of the schema "image"
$ slk search {\"image.Producer\":\"Max\"}
Search continuing. .....
Search ID: 9
$ slk list 9

# alternatively, use slk_helpers search_limited
$ slk_helpers search_limited {\"image.Producer\":\"Max\"}
...

Further query examples are given below. Available query operators are given in the Reference: StrongLink query language. See also StrongLink Command Line Interface Guide from page 6 onwards.

Print a search query in a human-readable way¶

We have got this search query and want to analyze it:

slk_helpers search_limited '{"$and": [{"resources.name": "INDEX.txt"}, {"$or": [{"$and": [{"resources.posix_uid": 25301}, {"path": {"$gte": "/arch"}}]}, {"path": {"$gte": "/double/bm0146"}}]}]}'

The search queries are written in JSON. You can use jq to print the search queries in a human-readable way:

$ echo '{"$and": [{"resources.name": "INDEX.txt"}, {"$or": [{"$and": [{"resources.posix_uid": 25301}, {"path": {"$gte": "/arch"}}]}, {"path": {"$gte": "/double/bm0146"}}]}]}' | jq
{
    "$and": [
        {
            "resources.name": "INDEX.txt"
        },
        {
            "$or": [
                {
                    "$and": [
                        {
                            "resources.posix_uid": 25301
                        },
                        {
                            "path": {
                                "$gte": "/arch"
                            }
                        }
                    ]
                },
                {
                    "path": {
                        "$gte": "/double/bm0146"
                    }
                }
            ]
        }
    ]
}

Example queries with explanations¶

The examples are partly taken from the StrongLink Command Line Interface Guide.

Example queries copied from the manual.¶

Query

Purpose

{"resources.size":{"$gte": 1048576}}

Find files greater than one megabyte (sizes are in bytes)

{"path":{"$gte":"/arch/project"}}

Find files in a specific namespace (recursively)

{"path":{"$gte":"/arch/project", "$max_depth": 1}}

Find files in a specific namespace (non-recursively)

{"resources.mimetype":"image/jpeg"}

Find files of a specific MIME type

{"resources.posix_uid":999}

Find files for a specific UID

{"resources.posix_gid":999}

Find files for a specific GID

{"resources.mtime":{"$gt":"2020-10-10"}}

Find files modified since a specific date

{"project.name":"hadron"}

Find files based on user-defined metadata. The user-defined schema and field name are the field. For example, if querying by the name field in the Project schema, the field you use in your query is Project.name.

{"resources.posix_uid":25301}

Find files of user k204221 (who has UID 25301)

{"image.Producer":"Max"}

Find images which metadata field Producer to be set to Max

{"resources.name": "search_me.jpg"}

Search for all files with the name search_me.jpg

{"resources.name": {"$regex": "file_[0-9].nc"}}

Search for all files which names match the regular expression file_[0-9].nc

{"$or": [{"resources.posix_uid":24855},{"resources.posix_uid":25301}]}

Find files which either belong user 24855 or user 25301

{"$and":[{"resources.name": "surface_iow_day3d_temp_emep_2003.nc"}, {"resources.posix_uid": 25301}]}

Find files with the name surface_iow_day3d_temp_emep_2003.nc which belong user k204221 (who has UID 25301)

Advanced query examples¶

# two types of delimiters
$ slk search '{"resources.size":{"$gt": 1048576}}'
$ slk search "{\"resources.size\":{\"\$gt\": 1048576}}"

# using shell variables in calls of slk serach
# ~~~~~~~~~~~~~~~~~~~~ method one ~~~~~~~~~~~~~~~~~~~~
$ id k204221 -u
25301
$ slk search "{\"resources.posix_uid\":25301}"
...
# ~~~~~~~~~~~~~~~~~~~~ method two ~~~~~~~~~~~~~~~~~~~~
$ export uid=`id k204221 -u`
$ slk search "{\"resources.posix_uid\":$uid}"
...
# ~~~~~~~~~~~~~~~~~~~~ method two ~~~~~~~~~~~~~~~~~~~~
$ slk search "{\"resources.posix_uid\":`id k204221 -u`}"
...

Note

The example shell commands are meant for bash. If you are using csh or tcsh then they do not work as printed here but have to be adapted. Please contact DKRZ support (support@dkrz.de) if you require assistance.

Imprint and Privacy Policy

© Copyright 2021, Deutsches Klimarechenzentrum GmbH.

Created using Sphinx 4.3.1.