Incomplete time-series gene expression in integrative study for islet autoimmunity prediction

Brief Bioinform. 2023 Jan 19;24(1):bbac537. doi: 10.1093/bib/bbac537.

Abstract

Type 1 diabetes (T1D) outcome prediction plays a vital role in identifying novel risk factors, ensuring early patient care and designing cohort studies. TEDDY is a longitudinal cohort study that collects a vast amount of multi-omics and clinical data from its participants to explore the progression and markers of T1D. However, missing data in the omics profiles make the outcome prediction a difficult task. TEDDY collected time series gene expression for less than 6% of enrolled participants. Additionally, for the participants whose gene expressions are collected, 79% time steps are missing. This study introduces an advanced bioinformatics framework for gene expression imputation and islet autoimmunity (IA) prediction. The imputation model generates synthetic data for participants with partially or entirely missing gene expression. The prediction model integrates the synthetic gene expression with other risk factors to achieve better predictive performance. Comprehensive experiments on TEDDY datasets show that: (1) Our pipeline can effectively integrate synthetic gene expression with family history, HLA genotype and SNPs to better predict IA status at 2 years (sensitivity 0.622, AUC 0.715) compared with the individual datasets and state-of-the-art results in the literature (AUC 0.682). (2) The synthetic gene expression contains predictive signals as strong as the true gene expression, reducing reliance on expensive and long-term longitudinal data collection. (3) Time series gene expression is crucial to the proposed improvement and shows significantly better predictive ability than cross-sectional gene expression. (4) Our pipeline is robust to limited data availability. Availability: Code is available at https://github.com/compbiolabucf/TEDDY.

Keywords: autoencoders; incomplete time-series gene expression; islet autoimmunity prediction; long short-term memory; multi-omics; type-1 diabetes.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Autoimmunity / genetics
  • Cross-Sectional Studies
  • Diabetes Mellitus, Type 1* / genetics
  • Gene Expression
  • Genetic Predisposition to Disease
  • Humans
  • Islets of Langerhans*
  • Longitudinal Studies
  • Time Factors