Navigation path

Open Science Monitor

What are open research data?

Open research data refers to the data underpinning scientific research results that have no restrictions on their access, enabling anyone to access them through the internet.

For the purposes of this monitor, the indicators are based on a distinction between ‘open data’ generally and ‘open research data,’ with a focus on the latter. That is, while there are many types of potential open data (e.g. government data), the scope is too large for the monitor and difficult to analyse and so we focus specifically on data underlying research publications.

 

Explore the indicators related to open research data

Select an indicator to see its description, visualise the data, understand its limitations, and identify the data sources.

Research data repositories

  • Number of data repositories
  • Case studies
    The research data repository case studies illustrate how open research data repositories can contribute to open science, how they can operate, and their coverage. Detailed case studies are also available for download.

Funder policies on data sharing

Researcher attitudes towards data sharing

 

 

Case Studies

Summaries provide a short description of the ‘what’, ’who’ ‘when’, and ‘why’ of open science-related initiatives. Detailed case studies are also available for download.

  • Structural Genomics Consortium
    Open access to research results on less well-studied areas
  • FAIRport
    Aiming to provide a minimal but comprehensive framework for developing and implementing good management and stewardship of research data and metadata in life sciences
  • Zenodo
    A general-purpose open access repository of research data and journal publications
  • Reproducibility Project
    A collaborative effort to replicate 100 psychology experiments
  • Sloan Digital Sky Survey
    An astronomical survey, collecting large data sets to reflect the large-scale structure of our universe in multi-coloured images
A detailed methodology report (PDF icon 229 KB) describes how the monitor was developed
Notes

Open research data case study: figshare

What is figshare?

Figshare is a web-based platform to help academic institutions, publishers and researchers to manage, disseminate and measure the public attention of all their research outputs. All research outputs on figshare are made available in a citable, shareable and discoverable manner.

How does figshare contribute to open science?

Figshare was established with the mission to make all academic data as open as possible. It is free to upload content, with all data available under liberal Creative Commons licenses. Users are not required to create a figshare account to access content, with all data available to download directly from the site, without author permission.
Figshare also has an open API allowing for integration and interoperability with other academic systems. This allows content to be uploaded, downloaded and a bidirectional flow of information between systems, without users having to interact with the figshare interface.

How does figshare work?

The platform allows any file format to be presented and visualized in a customized browser so that illustrative figures, data and other file sets, diverse audio visual media, papers, posters, and presentations can be disseminated in a way that complements traditional scholarly publishing technologies.

What coverage does figshare have?

Figshare has global coverage with users, based on the viewing of content from every country and major research institution in the world. The countries with the most figshare users are the U.S., UK, India, Germany, Australia, Canada, China, Japan, France and Italy. Figshare has coverage across a wide breadth of subject categories with the highest uptake of the service in the life sciences. Figshare supports files up to five terabytes and of all types, including figures, media, datasets, filesets, posters, papers, presentations, theses and code.

Download the full case study

chart
Source: Data provided by figshare (2016).

 

chart
Source: Data provided by figshare (2016).

 

chart
Source: Data provided by figshare (2016).

 

Open research data case study: GenBank

What is GenBank?

GenBank is a database of more than 197,000,000 assembled annotated DNA sequences for more than 340,000 formally described species. It is part of the International Nucleotide Sequence Database Collaboration (INSDC), a joint initiative of the National Center for Biotechnology Information (NCBI) in the U.S., the DNA Data Bank of Japan (DDBJ) and the European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI).

How does GenBank contribute to open science?

GenBank is a data-type specific open data repository that holds biological sequence data that is freely available for (re)use by the scientific community. Developed before the world wide web, it is one of the earliest open scientific databases, and the amount of data stored in it continues to grow.

How does GenBank work?

GenBank is one of three nodes of the INSDC; the other two nodes are the ENA, and the DDBJ. Scientists can submit sequence data to any of the three nodes, regardless of country of origin. Submissions come from both individual laboratories, and also large-scale sequencing centers. The nodes act as mirrors for the data, ensuring the data are stored in multiple places and are always available. Once the data are shared across nodes only the accession number reveals the original submission node. While the underlying data is the same at each node, the graphical representations, and possible analyses that can be done on the websites, vary across the nodes. GenBank processes 80-85% of the submissions amongst the three nodes.

What coverage does GenBank have?

DNA sequence data is uploaded to GenBank and used by scientists from all over the world.

Download the full case study

chart
Source: GenBank statistics webpage (2016) https://www.ncbi.nlm.nih.gov/genbank/statistics/ Retrieved: December 23, 2016 . Figures have been redrawn from the originals.

 

Notes
chart
chart
Source: Wiley’s Research Data Insights Survey (2014).
Retrieved: December 23, 2016 . Figures have been redrawn from the originals.

The Researcher Data Insights Survey was run by Wiley in March 2014. The aim of the survey was to understand how and why researchers share their data. The survey was sent in March 2014 to 50,000 Wiley researchers from both the sciences and the arts; 2,558 researchers at least partially completed the survey.

 

chart
chart
Source: The State of Open Data, Digital Science Report (2016).
Retrieved: December 23, 2016 . Figures have been redrawn from the originals.

The State of Open Data survey was run by figshare, Springer Nature and Digital Science in 2016. The aim of the survey was to understand researcher’s attitudes and experiences in working with data, sharing it and making it open; 2,000 researchers responded from a variety of countries and disciplines.

 

Structural Genomics Consortium

[ Download Full Document PDF icon ]

  • Demonstrates an open and collaborative approach at every stage of its working model
  • Enables a wider range of actors to participate in drug discovery
  • Catalysed drug discovery work on rare and neglected diseases
ico

What?

Open access to research results on less well-studied areas of the human genome with the aim of catalysing human biology research and drug discovery. The Structural Genomics Consortium (SGC) has placed over 1,500 protein structures and 75 kinase structures in the public domain.

ico

When?

2004-2011.

ico

Who?

The SGC started out as a collaboration between researchers at the University of Oxford and the University of Toronto. The network has since expanded to include universities in Brazil, Germany, Sweden and the United States.

ico

Why?

The SGC aims to mobilise a critical mass of expertise in order to overcome a decrease in productivity in drug discovery resulting from patenting policies and the complexity of the underpinning science.

ico

The Open Science Element

The SGC aims to remove the barriers to participation and collaboration in drug discovery by:

  • Foregoing patent claims
  • Providing open access to all outputs, including through the Protein Data Bank
  • Sending samples to researchers

Data FAIRport

[ Download Full Document PDF icon ]

  • Provides guidelines to support management and stewardship of research data in the life sciences
  • Developed using an open and collaborative approach
  • Used in an increasing number of projects and research communities
ico

What?

The initiative aims to provide a minimal but comprehensive framework for developing and implementing good management and stewardship of research data and metadata in life sciences. The initiative does not suggest any technology or protocol to achieve this goal, but rather provides a set of guiding principles and a framework to make research data Findable, Accessible, Interoperable and Re-usable (‘FAIR’).

ico

When?

The initiative was launched in 2014.

ico

Who?

The initiative started as a follow-up to the ‘Jointly designing a Data FAIRport’ workshop. The workshop was organised by the Netherlands eScience Center and the Dutch Techcentre for the Life Sciences (DTL), with attendees from leading international research infrastructures and policy institutes, publishers, semantic web specialists, innovators, computer scientists and experimental (e)Scientists.

ico

Why?

Aiming to support broadly the life sciences research community to reconcile the increasing volume of research data produced with the ability to analyse and link the data, which has not developed at the same speed.

ico

The Open Science Element

  • Guiding principles to make research data Findable, Accessible, Interoperable and Re-usable (‘FAIR’)
  • A framework for the practical implementation of the principles
  • Technological solutions based on ‘Hackathons’ and/or ‘Bring Your Own Data’ parties

Zenodo

[ Download Full Document PDF icon ]

  • Enables researchers to store and share journal publications and supporting data
  • Fosters open collaboration among researchers in different fields and from different institutions
  • Contributes to changing publishers, funders and researchers’ attitudes towards open science
ico

What?

A general-purpose open access repository of research data and journal publications.

ico

When?

Zenodo was launched in 2013.

ico

Who?

Zenodo is hosted at the European Organisation for Nuclear Research (CERN). It was created as part of the Open Access Infrastructure for Research in Europe (OpenAIRE) project funded by the EC.

ico

Why?

The repository was created to foster open collaboration among researchers from all types of institutions across all fields of science.

ico

The Open Science Element

The repository aims to become a model for:

  • Open access: Open sharing of research publications
  • Open data: Open sharing of research data including software, video/audio files, figures and tables, illustrations and datasets
  • Open collaboration: Open collaboration among researchers and between funders and publishers through the creation of communities

Reproducibility Project

[ Download Full Document PDF icon ]

  • Demonstrated an open, collaborative methodology
  • Informed debates about scientific reproducibility
  • Is helping drive change among publishers and funders
ico

What?

The project was a collaborative effort to replicate 100 psychology experiments. Only about 40 per cent of the original findings could be replicated.

ico

When?

2011-2015.

ico

Who?

The project was initiated and coordinated by Professor Brian Nosek, who now leads the Center for Open Science in Virginia, USA. The replications were carried out by 270 researchers around the world.

ico

Why?

The project was set up to systematically explore the reproducibility of scientific findings, focusing on the field of psychology.

ico

The Open Science Element

  • Open sharing of research designs and protocols
  • Interactions between the original researchers and those replicating their studies
  • Reuse of original materials
  • Raw data and reports on the replications made publicly available

Sloan Digital Sky Survey

[ Download Full Document PDF icon ]

  • An open research data project
  • Enabled a comprehensive mapping of the universe
  • The largest open access database of the universe in the world
ico

What?

SDSS is an astronomical survey, collecting large data sets via a 2.5-meter optical telescope run by New Mexico State University. The objective is to reflect the large-scale structure of our universe in multi-coloured images and with these to create three-dimensional maps.

ico

When?

SDSS was launched in 1990.

ico

Who?

The SDSS is conducted by the Astrophysical Research Consortium (ARC), a non-profit partnership among research universities and laboratories, and brings together research teams from leading astronomical institutes.

ico

Why?

To provide accurate measurements for a large number of galaxies. It was designed to gather enough data to address a broad range of issues in astronomical inquiry, from the Milky Way to solar systems. It was built using two key technologies: optical fibres and digital imaging detectors (CCDs), which were awarded the Nobel Prize in Physics in 2009.

ico

The Open Science Element

  • Open access to information about the universe
  • Citizen scientists classifying images through Galaxy Zoo
  • Tools including lesson plans and ideas to use Galaxy Zoo in education

Polymath Project

[ Download Full Document PDF icon ]

  • An open, collaborative initiative to find solutions to unsolved mathematics problems
  • Initiated a debate on the characteristics of and incentives for collaboration
  • Inspired a similar project to encourage students to conduct collaborative research
ico

What?

A collaborative website where researchers and interested people with a background in mathematics try to find solutions to unsolved problems in combinatorial mathematics. Unsolved problems are published on a blog and a wiki page that summarises all the knowledge developed for that specific problem. Research and discussion threads enable researchers to post their contributions and discuss solutions.

ico

Why?

The Polymath project was set out to understand whether ‘massive collaborative mathematics’ was a possible path for research in mathematics. Besides the solution to 3 of the 11 problems published to date, the project led to a parallel debate in the combinatorics research community on the characteristics that a collaborative approach should have and on the incentives for researchers to work collaboratively.

ico

Who?

The project was started by Cambridge professor and Fields medallist Timothy Gowers, and is mainly developed together with Prof. Terence Tao alongside Michael Nielsen who oversees the wiki pages.

ico

When?

2009 - present

ico

The Open Science Element

  • The collaborative approach allows all interested researchers to suggest problems and to collaborate on finding the solution
  • It has inspired a similar project targeted specifically focused on educating high school and college students on how to conduct research in a collaborative way (the ‘Crowdmath’ project)