TiMEG: an integrative statistical method for partially missing multi-omics data

Sarmistha Das; Indranil Mukhopadhyay

doi:10.1038/s41598-021-03034-z

TiMEG: an integrative statistical method for partially missing multi-omics data

Sci Rep. 2021 Dec 15;11(1):24077. doi: 10.1038/s41598-021-03034-z.

Authors

Sarmistha Das^{1

2}, Indranil Mukhopadhyay³

Affiliations

¹ Human Genetics Unit, Indian Statistical Institute, Kolkata, 700108, India.
² Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, USA.
³ Human Genetics Unit, Indian Statistical Institute, Kolkata, 700108, India. indranil@isical.ac.in.

Abstract

Multi-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case-control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.

Publication types

Research Support, Non-U.S. Gov't