Open access and the need to turn data into knowledge

BMC Biology began ten years ago, and BioMed Central three years earlier, right at the beginning of open-access publishing in biology. As part of BMC Biology’s 10-year anniversary, Editorial Board member Pat Brown (right) talks about the origins of open access and the circumstances that led him to become one of the early open access agitators. The rapid growth of the internet and the possibility of publishing without print is one obvious enabler of open access, but Brown also discusses another driver whose growth has accelerated along with open access publishing. As much as it is a product of the internet age, open access is also a product of the dawn of the high-throughput era in biology, Pat Brownwith the jump from analysing a handful of genes on a Northern blot to the thousands of genes interrogated in early microarray experiments resulting in a strong need to connect experimental results to a broader swathe of the existing literature. Brown describes the frustration he encountered in trying to do exactly this, and how his original vision of open access was so radical that much of it still has yet to happen – in particular, immediate data-sharing and reuse, with analytic tools to exploit the data to the full.

Curating data

Mike Tyers and Kara Dolinski share this concern. In an anniversary update on their 2006 paper describing the BioGRID database – originally holding information on a multitude of protein and genetic interaction data for the baker’s yeast Saccharomyces cerevisiae, and now expanded to more than 30 species – they discuss the urgent need for curation to make the best use, and indeed sense, out of the huge interaction datasets that make up a significant part of modern biology. The volume of data is overwhelming, and while BioGRID now contains comprehensive curation and annotation of interactions for a number of species, the literature is growing at such a speed that it is impossible for manual curators to keep up – and computerised curation isn’t yet sophisticated enough to represent a viable alternative. So what’s the answer? Tyers and Dolinski suggest that three things may be particularly important: deposition during publication of structured experimental data records in a standardised format that can be easily linked to and combined with other data; the addition of meta-data that reconciles contradictory data in the literature and gives investigators an estimate of its reliability; and a unified database of human and model organism interaction data that allows for inference across species to conserved but poorly understood human genes. None of these is trivial – but no one ever said turning data into knowledge would be easy.

View the latest posts on the On Biology homepage