The WDS-ECR Network was set up in September 2017 to promote scientific data stewardship, share best practices, and foster better communication among ECRs. As a co-lead of the Network, alongside Sabrina Delgado Arias (Science Systems and Applications, Inc.) and Ivan Pyshnograiev (Igor Sikorsky Kyiv Polytechnic Institute), I coordinate events, speaker series, and periodic teleconferences, as well as liaise with other ECR communities to share ideas on future data practices.
In 2018, the WDS-SC invited a representative of the ECR Network to take part in their meetings by opening up a one-year rolling seat on the Committee. This initiative facilitates communication with the next generations of data managers and enables WDS to develop activities targeting ECR’s interests. I represented the Network on the WDS-SC from July 2018 to June 2019. Being a member of the WDS-SC was an amazing experience and opportunity.
All SC members are working pro bono to share their ideas on how to best shape the future of data stewardship for better science. This is very exciting! SC members meet each month via teleconference, and then twice a year in person where most of the plans for actions are validated. In order to reach out to different communities, face-to-face meetings of the WDS-SC are often co-located with other WDS events such as regional conferences. I attended two such meetings, one in November 2018 in Cape Town, and another in May 2019 in Beijing. During the very intense two-day meetings, SC members present their ideas and discuss the tasks for WDS to undertake in the following months. I got to meet exceptional data experts from around the world, and took part in the decisions, and strategic actions and activities of WDS.
In particular, I participated in the preparation of a training workshop targeting ECRs that is sponsored by a grant of the European Geosciences Union. I saw how much work is involved in setting up such events, and I am sure it will be very rewarding for PhD students and Post-docs to learn more about Research Data Management. The training workshop is a great opportunity for those attending and crucial for the future of science. Being part of the WDS-SC provided me with the chance to share my inputs when necessary. I really appreciated seeing that my suggestions were valued. I thank all the SC members for their warm welcome and for the trust I was given. I also encourage all early career scientists and researchers who work with data to join the WDS-ECR Network. It might be you representing the ECR Network on the WDS-SC in the future!
Talking of which...Sabrina Delgado Arias will represent the WDS ECR Network on the WDS-SC from July 2019 to June 2020. We wish her all the best!
In this piece, the authors begin to describe intersections of information maintenance and care ethics in ways that are real and meaningful for information maintainers (i.e., those who manage, maintain, and preserve information systems).
Contributors to this document have varied experiences with information maintenance: community organizers and facilitators, archivists, repository managers, project managers, designers, librarians, researchers, grantmakers, educators, and more. The authors invite those in occupations and roles who understand that the relationship is especially valuable between information maintenance and an ethic of care to read, react, share, and engage with this potluck of ideas. Please circulate widely!
The main point of the article is to encourage the broad research community to work towards open and FAIR data and put in place the policies, guidelines, incentives, and funding necessary to support the needed culture and systemic change around how we handle our scientific data.
A Blog post by Ingrid Dillo (WDS Scientific Committee Vice-chair; Deputy Director of WDS Regular Member: DANS, Data Archiving and Networked Services)
In this WDS Blog post, I want to highlight a set of guidelines developed in a community that is not yet very well represented within the membership of the World Data System, but that is getting more and more involved. I am talking about the Humanities. Coming from the Humanities myself, and being active in a broader international data environment, I know from experience that the Humanities data community has a lot to offer other disciplines. Humanists often struggle with very fuzzy, multi-interpretable, scattered, and incomplete data, and so they need to be highly resourceful. For the Digital Humanities, therefore, international collaboration is a sine qua non.
An example of such international collaboration is the PARTHENOS Projectthat comprises 16 European partners, including DANS (a WDS Regular Member). PARTHENOS stands for ‘Pooling Activities, Resources and Tools for Heritage E-research Networking, Optimization and Synergies’. It is inspired by Athena Parthenos, the Greek goddess of wisdom, inspiration, and civilization.
PARTHENOS aims to strengthen the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology, and other related fields. This is being achieved through, for example, the definition and support of common standards and the harmonization of policy definitions and implementation.
One of the activities under the umbrella of PARTHENOS concerns the definition of common policies and implementation strategies for Research Data Management (RDM). The ubiquitous FAIR principles were chosen as a framework to structure a set of guidelines and recommendations. The concrete (and freely available) outcome of this activity is the very practical booklet: Guidelines to FAIRify data management and make data reusable.
The booklet offers a series of guidelines to align the efforts of data producers, data archivists, and data users in the Humanities, and thus make research data as reusable as possible. The guidelines are the result of the work of over 50 PARTHENOS project members, who were responsible for investigating commonalities in the implementation of policies and strategies for RDM and who conducted desk research, questionnaires, and interviews with selected experts to gather around 100 current data management policies—including guides for preferred formats, data review policies, and best practices (both formal and tacit).
The booklet also offers recommendations for two important stakeholder groups:
Researchers and research communities,
Research infrastructures and in particular, data repositories.
By focussing on (meta)data and repository quality, a set of twenty guidelines was extracted. For easy reference, the guidelines have been grouped under the four FAIR principles.
The guide starts with an important message: Invest in people and infrastructure. Investing in data infrastructures and trustworthy data repositories, as well as in hiring and educating data experts, is an important prerequisite to be able to implement any data management guideline. This way, we can enable researchers to comply with data management mandates coming from funders and journals.
Please have a look at the set of guidelines and see whether they are reusable in your domain.
Data drives so much of our professional life today. From the organization of business meetings (virtual or face-to-face) to the publication of our research results. Data may simplify or complicate our lives, but for sure it is ubiquitous, though often unseen and behind the scenes.
But, what are the future challenges? And who are the future influencers and curators, when thinking about scientific data, its analysis and curation? We take a closer look at "our" WDS future generation, the enthusiastic group that builds the Early Career Researchers and Scientists (ECR) Network. And we take our hat off to our young and outstanding awardees, such as Wouter Beek in 2018. They represent the next generation and are our link to upcoming thrills and challenges in data science and management. They are our inspiration and hope when it comes to data curation for the next generations to come and we hope they raise their voice to become data influencers in the scientific community.
Want to be part of the next generation of data influencers? Want to meet fellow data experts keen to share their experience or want to support an outstanding colleague working with data? Do not hesitate to join and participate in different activities, WDS proposes. The WDS ECR Network is always happy to invite data experts or future data influencers to join their telecons and events. Moreover, the WDS Data Stewardship award is a good opportunity for you to support your colleague to be part of the next generation of data experts: the Call for Nominations is now open.
Big data gurus and advocates for a cyberinfrastructure or big data science describe a data-centric future in which massive quantities of digital data will be available for reuse in research, artificial intelligence, making predictions, or engaging in data-driven discovery. In the world of biology, this translates into the expectation is that molecular sequence information will be available from the nucleotide repositories such as GenBank, or that any and all occurrence data can be found at GBIF. It is also presumed that the data will have been vetted and, in all aspects, are trustworthy.
The vision is flawed. An unknown but large fraction of newborn digital data does not make it beyond the maternity ward. If data are to be properly prepared for re-use in the big data world, they must have moved a long way from the hands of its creators and into the custody of data managers, and repositories that will guarantee access to vetted content in perpetuity.
There are hundreds of thousands of sources where digital information is born. The long tail of parents include individual researchers, research teams, research programs, legacy data recovery projects, local, state, national governmental bodies and international initiatives. These parents rarely have the understanding or skillsets to ensure that their newborn will mature appropriately for a rôle in the big data world. For this to happen, data must be handed on to those who specialise in data management and curation. These adoptive parents will shepherd the content through the maturation process that will make it ready for repositories that are designed to make trustworthy data and services available to the public. The challenges to completion of the path are numerous. The first step is simply to make the data visible and accessible. Bad data need to be set aside or put back on the right path. For content to be discoverable, standardized metadata and ontologies need to be added so that the data can be found in the appropriate context. Interoperability requires access through appropriate services and for the data to be clothed with standardized ontologies and metadata. Just as the idiosyncratic swaddling clothes must be set aside, the new descriptors will need to be embellished with increasing detail, and be continually corrected and improved. Provenance metadata will help creators and managers gain credit for their effort, and will open up a pathway through which concerns about the data can be expressed. There will be problems that are specific to particular disciplines. As an example, relationships among taxa in ‘evolutionary trees’ which are created by algorithms become less trustworthy as new information and new algorithms emerge. In the biodiversity sciences, taxa may be mis-identified. Further, with the passage of time, new species are discovered - a process that renders ambiguous taxa identified by earlier less stringent criteria The ecosystem through which the content moves must provide the support that ensures continued fitness for purpose. Confidentiality and ethical concerns vary with subject matter but also have to be addressed.
As data mature, they will move from the hundreds of thousands of parents to a small number of data repositories that are funded using models that guarantee the persistence of their services. As far as is feasible, we expect the managers and repositories to apply the FAIR principles to the content they hold. Then, if the holder of the baton can meet the expectations of the CoreTrustSeal accreditation, the data will have found a secure and persistent home, with data ready for reuse. Fifty or so repositories have gained the CoreTrustSeal certification. But, as we have seen from the recent US governmental shutdown in December 2018 and January 2019, even major and certified data suppliers cannot be relied upon and may blink out unpredictably.
Many components already exist, nor are they joined up. Not only do most data fall by the wayside, much is not fit for a rôle in a data-centric world. The data are too contextualised, descriptors are incomplete or inaccurate. Few, if any, of the big data world providers allow users to correct errors. The consequence is that users of open data have to work with contaminated material. The World Data System is charged by the International Science Council to promote universal and equitable access to scientific data and information and increasing the capacity to generate new knowledge. WDS is especially concerned with the trustworthiness of the data and services. We will move further faster when we acknowledge that the research and discovery paradigm needs to be complemented with an investment in infrastructure and services. That investment will provide the framework and support that is required for data to live long and to prosper.