IsoDA: Isoform-Disease Association Prediction by Multiomics Data Fusion

J Comput Biol. 2021 Aug;28(8):804-819. doi: 10.1089/cmb.2020.0626. Epub 2021 Apr 7.

Abstract

A gene can be spliced into different isoforms by alternative splicing, which contributes to the functional diversity of protein species. Computational prediction of gene-disease associations (GDAs) has been studied for decades. However, the process of identifying the isoform-disease associations (IDAs) at a large scale is rarely explored, which can decipher the pathology at a more granular level. The main bottleneck is the lack of IDAs in current databases and the multilevel omics data fusion. To bridge this gap, we propose a computational approach called Isoform-Disease Association prediction by multiomics data fusion (IsoDA) to predict IDAs. Based on the relationship between a gene and its spliced isoforms, IsoDA first introduces a dispatch and aggregation term to dispatch gene-disease associations to individual isoforms, and reversely aggregate these dispatched associations to their hosting genes. At the same time, it fuses the genome, transcriptome, and proteome data by joint matrix factorization to improve the prediction of IDAs. Experimental results show that IsoDA significantly outperforms the related state-of-the-art methods at both the gene level and isoform level. A case study further shows that IsoDA credibly identifies three isoforms spliced from apolipoprotein E, which have individual associations with Alzheimer's disease, and two isoforms spliced from vascular endothelial growth factor A, which have different associations with coronary heart disease. The codes of IsoDA are available at http://mlda.swu.edu.cn/codes.php?name=IsoDA.

Keywords: alternative splicing; data fusion and multi-instance learning; isoform-disease association; multi-omics data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing*
  • Computational Biology / methods*
  • Gene Expression Profiling
  • Genetic Predisposition to Disease / genetics*
  • Genomics
  • Proteomics
  • Software