A simple, scalable approach to building a cross-platform transcriptome atlas

PLoS Comput Biol. 2020 Sep 28;16(9):e1008219. doi: 10.1371/journal.pcbi.1008219. eCollection 2020 Sep.

Abstract

Gene expression atlases have transformed our understanding of the development, composition and function of human tissues. New technologies promise improved cellular or molecular resolution, and have led to the identification of new cell types, or better defined cell states. But as new technologies emerge, information derived on old platforms becomes obsolete. We demonstrate that it is possible to combine a large number of different profiling experiments summarised from dozens of laboratories and representing hundreds of donors, to create an integrated molecular map of human tissue. As an example, we combine 850 samples from 38 platforms to build an integrated atlas of human blood cells. We achieve robust and unbiased cell type clustering using a variance partitioning method, selecting genes with low platform bias relative to biological variation. Other than an initial rescaling, no other transformation to the primary data is applied through batch correction or renormalisation. Additional data, including single-cell datasets, can be projected for comparison, classification and annotation. The resulting atlas provides a multi-scaled approach to visualise and analyse the relationships between sets of genes and blood cell lineages, including the maturation and activation of leukocytes in vivo and in vitro. In allowing for data integration across hundreds of studies, we address a key reproduciblity challenge which is faced by any new technology. This allows us to draw on the deep phenotypes and functional annotations that accompany traditional profiling methods, and provide important context to the high cellular resolution of single cell profiling. Here, we have implemented the blood atlas in the open access Stemformatics.org platform, drawing on its extensive collection of curated transcriptome data. The method is simple, scalable and amenable for rapid deployment in other biological systems or computational workflows.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Data Curation
  • Gene Expression Profiling
  • Humans
  • Transcriptome*

Grants and funding

Stemformatics was established through Australian Research Council Funding to Stem Cells Australia (SRI110001002)and to CAW (Future Fellowship FT150100330) (https://www.arc.gov.au). KALC was supported by the National Health and Medical Research Council (NHMRC) Career Development fellowship (GNT1159458). PWA and JC are funded by NHMRC (GNT1181327) and (APP1186371) to CAW (https://www.nhmrc.gov.au). NR is funded by the Centre for Stem Cell Systems (https://biomedicalsciences.unimelb.edu.au/departments/anatomy-and-neuroscience/engage/cscs) and the CSIRO Synthetic Biology Future Science Platform (https://research.csiro.au/synthetic-biology-fsp/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.