Missing data in multi-omics integration: Recent advances through artificial intelligence

Front Artif Intell. 2023 Feb 9:6:1098308. doi: 10.3389/frai.2023.1098308. eCollection 2023.

Abstract

Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.

Keywords: Bayesian; artificial intelligence; data integration; machine learning; missing data; multi-omics; multi-view; neural networks.

Publication types

  • Review

Grants and funding

This research was supported by the Predictive Phenomics Initiative Laboratory Directed Research and Development Program at Pacific Northwest National Laboratory (PNNL) and capability development funding from the Environmental Molecular Sciences Laboratory (EMSL; grid.436923.9), a DOE Office of Science User Facility located at PNNL and sponsored by the DOE Office of Biological and Environmental Research. PNNL is operated by Battelle for the DOE under Contract DE-AC05-76RL01830.