Publication of data for global dissemination and collaboration¶
Publishing and making data available via ESGF (Earth System Grid Federation) does not imply long-term archival of the data.
The ESGF allows for global online visibility and accessibility of data products as well as globally uniform interfaces for web-based data search, access and subsetting. Especially the large Model Intercomparison Projects (MIPS) conducted in recent years (e.g. CMIP6) could not have been facilitated without effective ways of making petabytes worth of climate model data globally available.
Step by Step Guide for ESGF¶
0) Initial contact to DKRZ for clarification of publication conditions¶
The following points have to be clarified before data can be accepted for ESGF-publication:
Data storage and volume
Data Reference Syntax
Contact DKRZ’s ESGF-team with a request to publish a dataset in the ESGF, including information on the project (own project or part of a bigger consortial context?), the expected data volume and storage.
Data published in the ESGF via DKRZ must be stored on the in-house HPC lustre file system. The quota needed for sustainable storage of the data must be provided by the data provider if it is not part of a larger consortium like e.g. CMIP. Quota on HPC File System is granted in the framework of a data project at DKRZ.
Data published in ESGF must be in a standardised netCDF-format. If your data is a contribution to a community project, the data standard is provided by that project, see e.g. the CMIP6. If you want to publish data from your own project, it is recommended that the data is standardised along the lines of existing standards.
The directory structure of data published in ESGF must strictly follow a predefined Data Reference Syntax (DRS) so that the data can be accessed via the ESGF faceted search in the web-interface. Essentially, the DRS is a defined directory structure allowing for a clear identification of an individual file in the myriad of files available in large MIPs.
Data structures which do not comply with the agreed DRS are not publishable in ESGF
Every project in ESGF may define its own preferred DRS
1) Establish and sign publication agreement¶
The publication agreement form is available for download
The information regarding the organisation of the data, adherence to data standards, especially the DRS, have all been discussed in step 0).
Ensuring that the data to be published in ESGF remain publicly accessible at the location specified in the publication agreement is essential.
2) Data standardisation¶
Your data need to comply with the structure laid out in the publication agreement - the processing of the data is to be performed by you
DKRZ provides services, e.g. software packages like cdo cmor for the process of the data standardisation.
Once the processing of the data in accordance with the publication agreement is completed, the data have to be openly accessible on DKRZ’s lustre file system for quality assurance checks by DKRZ staff.
3) QA checks¶
- DKRZ staff performs quality assurance (QA) checks for the compliance with the data standards laid out in the publication agreement.
If the data passes the QA checks, publication in ESGF can proceed.
If the data are found to not fulfil the required standards, you are informed by DKRZ staff about the amendments needed - after these have been applied, the data will be checked again (and sent back with requests for amendments if needed)
4) Data ingestion into the ESGF catalogue¶
DKRZ staff sets up and performs the ingestion of the data into the ESGF catalogue
Please note: data published in ESGF which is stored on the DKRZ lustre file system is not backed up!
5) ESGF publication and data access via the DKRZ CMIP data pool¶
Once fully ingested, the data are findable and accessible via the ESGF web interface
Addtionally, the data are accessible to DKRZ users on Mistral via the DRKZ data pool.
This also holds for all other datasets published in ESGF by DKRZ, i.e. CMIP5, CORDEX, ReKlies or MiKlip data are easily accessible and do not have to be downloaded. For more information, please see the description of the DKRZ CMIP data pool
6) Long-term archival of the data at DKRZ¶
If desired, the data can be preserved using the long-term archiving service at DKRZ
The option to archive data in LTA WDCC is part of the publication agreement signed at the beginning of the ESGF publication process and involves further interaction between DKRZ staff and the data provider. For more information regarding the process, please refer to the and/or contact firstname.lastname@example.org.
data archived in LTA WDCC can still be globally accessed using the ESGF web interface