WDS Knowledge Network Concept
WDS Infrastructure Model
The World Data System of the International Science Council (WDS) is supporting a conceptual design, or vision, of interlinked foundational Global Research Infrastructure that is sustainable, scalable, and distributed amongst leading organizations, data centres, and initiatives. To achieve this, one has to identify the major actors (people, institutions, research outputs, and systems) that contribute to the conduct of science today, and then implement a minimum Knowledge Network that will maximize the return on investment for all (or most) of these actors.
A central characteristic of the Knowledge Network is that it should allow contributions from many sources, including—at least conceptually—from non-traditional sources such as project websites, social media, and citizen science, in addition to the obvious source of metadata.
WDS foresees that such a Knowledge Network will draw on the example of Linked Open Data, and reuse as many services, components, standards, and existing capacity as is possible. WDS has a role in cementing the missing relationships in the network, and in providing components (nodes) that are not yet available.
The Knowledge Network, in its implementation, must satisfy a number of considerations, some of which were mentioned already:
- Scalability. This is required in two ways:
- Limiting the extent to which institutions, projects, or other initiatives need to process and commit content in bulk—by rather relying on many, near-real time contributions from all participants.
- Allowing for a distributed resource, while avoiding fragmentation. This distribution is a reality already, but it is not possible for humans or machines to access and exploit the network in a predictable, uniform way. Centralization or a monolithic solution will not work, for a number of practical, political, and technical reasons.
- Sustainability. This applies to technical, governance, and financial elements of implementation.
- Technical sustainability pivots on standards, scalability, and open, extensible software.
- Sustainable governance should include community support and buy-in, quality assurance, and elements of trust.
- Financial sustainability is a prerequisite, given the extent to which other investment and infrastructure will rely in future on the Knowledge Network. There is no apparent long-term funder for the entire network; hence, the practical solution must be a community consensus on the loose couplings between sustainably funded components.
- Diversity. A portfolio of initiatives, projects, and established interests all contribute to and/or own parts of the network. There is no incentive to recreate these components; hence, the only sensible approach is the maximum reuse within the Knowledge Network.
- Ease of Use. A significant hurdle to current use of research infrastructure and services is the weight of technical knowledge required: it should be easy and simple to contribute to and exploit the network.
- Registries. Any Knowledge Network will rely on essentially a ‘Registry of Registries’—a subset of the Linked Open Data universe—with roles assigned to the contents of these registries.
- Relationships. While many institutions or initiatives contribute the elements of the Knowledge Network (the ‘nodes’), very few (if any) store the relationships between them explicitly.
More About Relationships
A central aspect of the Knowledge Network, as depicted in Figure 1, is the relationships between the ‘nodes’ or elements of the network. As an example, ‘People’ can have the following standard relationships:
- To other People, by virtue of Collaboration, Association, and so on.
- To Trusted Data Repositories, by Depositing Research Outputs.
- To their own Research Outputs, through Authorship.
- Indirectly to Coverages, by virtue of Interest in Topics, Locations, and Time Periods.
- By being Hosted or Employed by Institutions.
Many of these relationships are implicit in metadata, but cannot easily be extracted by others. For example, the fact that two people are co-authors of a dataset implies collaboration, but this cannot easily be determined from metadata. The Knowledge Network mines metadata, amongst other sources, for these relationships, but it can also be contributed from many other sources. At a minimum, simple interfaces should be available to commit metadata, originating in a number of domain standards, to the Knowledge Network as and when it is created/updated. Collaboration with DataCite (WDS Partner Member) will be key in this regard.
Leveraging Existing Infrastructure
Several elements of the Knowledge Network exist already; for example, DataCite assists with the allocation of Digital Object Identifiers (DOIs) to some research outputs (data specifically), ORCID (WDS Partner Member) assigns permanent identifiers to researchers, and so on. Figure 2 identifies these existing components—provisionally, at this point.
Specific actions that can form part of an initial phase of implementation include:
- Collaboration with DataCite, leveraging on the merger between re3data, DataBib, and DataCite to establish a Registry of Trusted Data Repositories. This is a subset of the registry envisaged by DataCite, or an independent extension of the registry that aggregates quality and certification properties (CoreTrustSeal, ISO 16363, etc.), managed by WDS.
- Inclusion of ORCID as a major registry of permanent identifiers for researchers.
- DOIs for citable datasets, as provided by DataCite.
- Relationships mined from IWDS and possibly DataCite metadata, supplemented by information from the WDS membership.
The Knowledge Network can grow in subsequent phases in three ways:
- Extending to other elements (topic coverages, institutions, publishers, funders).
- Inclusion of repositories that do not necessarily form part of the WDS membership.
- Contribution of relationships that do not originate from metadata.
All of these are desirable extensions, and improve the utility of the Knowledge Network for users.