Standards for better (re-)use of data – Day 3 of workshop series on methodology

Our May workshop series on standards came to an end on 20th May 2020 with discussions and exchanges on how to make use of existing data to enhance our understanding of microbiome structures and functions (read more on Day 1 and Day 2). As before, introductory talks with inspiration on potential opportunities and hurdles to overcome aimed to support workshop participants in defining critical steps and developing a roadmap towards better use of existing data.

Why re-use data?

With the advance in technology, data on microbiomes have become available in masses through online sharing platforms. However, often essential data characteristics (a.k.a. meta-data) are insufficiently provided. Meta-data could be anything like the temperature at which samples were collected if we think of soil samples or the animal’s age from which faeces were taken to examine it’s microbiome. In the case of animal experimentation, even if microbiomes are just sampled from faeces and there is no harm to the animals, it would save animals (of course!), as well as time and effort to repeat experiments done in a similar manner. In some cases, data do exist but are not provided publicly, particularly if, for instance, there are IP concerns by industrial partners. Finally, different data sharing platforms may capture data differently, making it difficult to compare data easily. With the large amount of data available, it is time-consuming having to sort them by hand.

So, how can the research community make use of all the existing data and those that are still being produced, in an efficient manner to save financial resources and time?

Many factors to consider…

The discussions on data were far reaching covering everything on:

What is actually doable with the current data-sets and what not (i.e. where are there still technological limitations to understand, for instance, whether we deal with active microbiomes or dead microbes in soil)?
What do we need for machine learning and development of artificial intelligences that could support analysing the huge amount of omics data that we can now get out of microbiome samples?
How do we deal with all the different types of microbes, not just bacteria?
Can we go beyond what organisms are present in a microbiome and understand their function (i.e. what they do)?
How do we integrate different types of data with each other?
FAIR data sharing
What are the best tools and which ones are best for what type of analyses?
What are the practical aspects of IT needs to store and process large amounts of data, and financially maintain a centralised infrastructure?
Ethical issues around meta-data particularly in human studies
And many more

… yet what is best?

There was a general acknowledgement that despite efforts having been made to introduce benchmarks so far, the research community at large is probably still at the beginning of the standards journey. The interdisciplinary approach of having microbiologists and bioinformaticians exchanging was considered crucial to capture all relevant factors for solid data use principles, yet at the same time the general agreement was that it is unlikely we can have one gold standard that captures everything on data use and re-use in microbiome research.

Next steps

As with previous workshops MicrobiomeSupport partners will carefully consider the input received during the workshop and start on coming up with a roadmap that will be published, hopefully later in 2020. In the meantime, follow us on Twitter or sign up to our Newsletter to make sure you are one of the first ones to know, once we are ready to share!