Imaging data is increasingly important in the healthcare and life science industries, with artificial intelligence able to train itself to detect needles in the hay and arrive at verdicts that would take humans significantly longer. Such projects typically work on siloed data however as there is no real central repository that allows data types to be linked and shared among the research community.
Researchers from the University of Dundee, the European Bioinformatics Institute (EMBL-EBI), the University of Bristol and the University of Cambridge hope to rectify that via a new repository called the Image Data Resource (IDR), with their work described in a recently published paper.
It's believed to be the first biological image repository that is capable of storing and integrating data from multiple laboratories, whilst also significantly enhancing the potential for sharing and reusing imaging data.
"Imaging will only be truly transformative for science if we make the data publicly available," the team say. "Scientists should be able to query existing data to identify commonalities and patterns. But to make this possible we need a robust platform where researchers can upload their imaging data and easily access data from other experiments. The Image Data Resource is the first step towards creating a public image data repository for the life sciences."
The team reveal that whilst there are various image repositories around the world, none of them are either generic or linked to other forms of bio-molecular data. That makes it difficult to reuse the data for new studies.
Such an environment has been difficult to build in large part because of the complexity and heterogeneity of medical image data. It's also required huge computing resources and a large dollop of curation expertise.
"Imaging data is large, yes, but the real challenge is that it is heterogeneous and multidimensional," the team say. "Curating, storing and analysing imaging data require significant effort and computing power. The creation of the IDR prototype was only possible thanks to a strong collaboration between several scientific organisations."
By pooling image data with other forms of medical data types, IDR promises to be extremely valuable to the research community. For instance, it wouldn't just show you the image of a cell, but also describes the image and the conclusions that can be drawn from it.
IDR contains a broad range of data, including:
- High-content screening
- Super-resolution microscopy
- Time-lapse imaging
- Digital pathology imaging
- Experimental protocol metadata
- Observed effects in cells and features
- Cross references with molecular archives
The researchers showed off the potential of IDR in projects at both Dundee and the University of Bristol. The projects identified genes from various studies that caused cells to elongate when mutated or removed. By merging together data from various studies, they were able to develop a gene network that gave them a better view of how the genes affected the shape of the cell, giving them important insight into understanding metastatic cancer.
"Expanding the public archives to include imaging is of huge interest to the biotech industry and drug development companies. It offers potential to identify new therapies and targets, and broadens the scope of research by allowing scientists around the world to access each other's imaging datasets," the team say.
They next hope to secure the level of support needed to scale the project up into production-ready infrastructure. The open source nature of the software and technology also opens the possibility of the tech being deployed in other image data systems. All in all it's an important step towards a more open form of science.