Data publication guide
How to publish your research data in Yoda
This guide outlines the basic steps that are necessary for creating an Earth science data publication using Yoda. An explanation of the terminology used in this guide can be found at the bottom of this page.
Yoda (short for ‘your data’) is a research data management service developed by Utrecht University that enables researchers to deposit, publish, and preserve their research data. It offers researchers a collaborative environment to store their research data. This research data can be shared with collaborators or members of the research group, if needed. The steps to go from storing research data to a formal data publication that can be accessed and cited by others are outlined in the guide below. Once data is published in the Yoda repository, the associated metadata can be harvested by research data catalogs, such as the EPOS Multi-scale laboratories data catalog, making your data publication Findable, Accessible, Interoperable, and Re-usable (FAIR).
The process to publish your data in Yoda is outlined by the 9 steps below:
The Geosciences data manager (Vincent Brunst – v.brunst[at]uu.nl) will
1) give you access to the Yoda data repository to store your research data, and
2) be able to address any questions you may have after browsing through this tutorial.
The research data from which you want to create a data publication can:
- i) Already be in the Yoda environment. In this case you can ignore step 5 from this guide (since your data is already in Yoda).
- ii) Be stored outside of the Yoda environment. In this case you need to upload your dataset to Yoda in step 5 of this guide.
In both cases:
- Create a folder in which to place the dataset that you want to publish.
- Decide on the data that you want to include in the dataset for publication. This can be based on what you want to publish, what your journal wants you to publish and/or what your funder wants you to publish.
- Give the files and (sub)folders appropriate names, making them easy to interpret for other researchers.
- Assign a logical folder structure.
Create any necessary documentation (called unstructured metadata) that provides essential information for future use of the dataset, notably how the data was collected, how it should be interpreted, and how it can be reused.
Consult the funding agent and/or publisher’s requirements for both the type of license and embargo period. In most cases we advise to use a CC BY 4.0 license. This license allows anyone to share (copy and redistribute the data in any medium or format) and adapt (remix, transform, and build upon the data) the dataset, under conditions of i) correct attribution (you receive credit through citation), ii) duly indicate if and how the dataset is adapted, iii) linking to the license information. Depending on your licensing requirements, you can also use this tool to select the appropriate license, or contact the data manager when in doubt.
The embargo period (usually set between 0 and 3 years) should be decided upon based on your preference and the requirements laid out by the funder or journal. During the embargo period, only the metadata is published (so others are aware the data set exists). Publication of its actual content then only takes place once the embargo period has elapsed.
This creates a so-called data package. Metadata provide qualitative descriptions about your data, e.g. how/where were the data obtained, on what samples, etc. The steps to add metadata are described here.
The data manager will review the file types, folder structure, file formats, and metadata. Apply the corrections that the data manager proposes. After you have implemented the suggested changes, the process can be repeated if necessary, or the dataset continues to the next publication step.
The dataset can now be transferred as a bounded data package for sustainable storage in the Yoda archive (also known as the “Vault”). Your data package will be retained unchanged during its retention period in the vault. The steps to archive your data are found here.
After archiving the data package in the vault, it is ready to be published. Your data package will obtain a DOI and the structured metadata will be published so it can be harvested by data catalogs, making your data publication findable, accessible, interoperable, and re-usable! The process of publishing your data package is described here.