The Wheat Data Interoperability Working Group (WDI) was created within the framework of the Research Data Alliance (RDA). The RDA is a global initiative that aims to foster the open sharing of data across technologies, disciplines, and countries to address society’s major challenges.
The WDI’s goal is to provide a common, open-standard framework for describing, representing, linking, and publishing wheat data. Here, we present the case for establishing the Wheat Data Interoperability Guidelines, which will promote and sustain wheat data sharing, reusability, and interoperability. We have decided to focus on a limited set of wheat data types: SNPs, genomic annotations, phenotypes, genetic maps, physical maps, and germplasm.
The Wheat Data Interoperability Guidelines will have a threefold impact. They will:
- Promote the adoption of common standards, vocabularies, and best practices for wheat data management.
- Facilitate access to as well as discovery and reuse of wheat data.
- Facilitate wheat data integration.
We conducted a survey from April 7 to June 3, 2014, to identify existing data management practices within the wheat community. We received replies from 196 individuals in at least 31 different countries. Our preliminary analysis of the results tends to confirm the need for a framework and guidelines for wheat data management:
- Most of the respondents (114) currently store their data on local drives, but the majority of them (84) are willing to use shared databases and repositories.
- More than 50% of the respondents have not yet established data management guidelines.
- When asked which data types will be the most important over the next five years, the majority of the respondents listed the same categories that we had chosen (i.e., phenotypes, SNPs, genomic annotations, germplasm, genetic maps, and physical maps). Only 24 of the 196 respondents mentioned other data types.
- In addition to using a variety of data types, the wheat community also employs a wide range of data formats (standardized and unstandardized). For some data types, like phenotypes, the data format used is inconsistent. Hence, we view this situation as an opportunity for standardization and harmonization.
- The use of ontologies and metadata standards is not common. Only 49% of the respondents are currently using ontologies, and only 23% are using metadata standards. Most of the respondents do not see how such standards could be beneficial, and some evoke the lack of implementation and/or standardization as their reason for not conforming to a standard themselves. This underscores the need to highlight the advantages of common vocabularies and metadata standards for wheat data description and representation.
The next steps will consist of:
- Discussing these issues with wheat data experts: the goal is to a) identify the minimum set of properties that should characterize datasets of each data type to ensure meaningful data sharing, b) identify the gaps in standards that remain to be filled, and c) come up with recommendations for data formats and metadata standards. We are planning to hold a 1-2 day workshop with wheat data experts before the end of October.
- Writing a cookbook that will provide guidelines for wheat data description and representation. We will base our methodology on that used by the FAO used to create the LODE cookbook.
The benefits expected from the Wheat Data Interoperability Guidelines are the following:
- Wheat data managers and data scientists will have a common and global framework with which to describe, document, and structure their data.
- Researchers, growers, breeders, and other data users will be able to seamlessly access, use, and reuse a wide range of wheat data. Data linking will also facilitate the development of new data analyses and knowledge discovery methodologies.
- Data managers and scientists focused on other plant species will be able to benefit from the reusable data framework we have established.
- Researchers working on other plant species will be able to more easily access and reuse wheat data and forge links with their own data.
The cookbook could potentially be adapted for use with other crops, such as rice and maize, which are also very important for food security.