15 Integrating single-cell ’omics datasets
This is a very big topic, too big to cover in depth in this workshop!
However, we have already seen a few approaches to integrating single-cell RNA-seq data in various ways. This chapter provides the opportunity to discuss these ideas further and develop something of a taxonomy of data integration aims and approaches for single-cell ’omics data.
“Data integration” of single-cell ’omics data may cover any or all of the following:
- Batch correction within an experiment/study;
- Using a reference (“atlas”) to inform analysis of a generated dataset;
- Combining data from the same biological system but across different studies/labs/platforms;
- Making use of multiple ’omics “views”: from the same cells or not;
- … and many more possibilities!
The 12 Grand Challenges in Single-Cell Data Science preprint provides a more systematic way of thinking about the different types of single-cell ’omics data integration that we may want to achieve. The figure below lays out several distinct types of approach.
This table provides some more details and examples:
As we can see here, there are many different approaches to integrating data, and the approaches depend on the data types we have and what we want to achieve. Some types of data integration are already eminently feasible; others require much more methods and software development before they are achievable. Ultimately, it all comes back to our biological questions. What questions we want to answer will drive the data we generate and the approaches we might sensibly take to integrate that data.
A final thought: in some (many?) what we might call data synthesis might be preferable to data integration. That is, we might not need or want to combine disparate data sets and data types into one holistic (and likely very challenging) analysis. Rather, we might instead analyse different data sets/types separately and synthesise what we learn from each of them to answer biological questions of interest. Kind of how science is supposed to work!
15.1 Further reading
We recommend that you read section 6.1 of the “Grand Challenges” paper for more detailed discussion of the current status of data integration for single-cell ’omics data and open problems that remain:
- Laehnemann,D. et al. (2019) 12 Grand challenges in single-cell data science PeerJ Preprints. link