Isoform function prediction by Gene Ontology embedding

Bioinformatics. 2022 Sep 30;38(19):4581-4588. doi: 10.1093/bioinformatics/btac576.

Abstract

Motivation: High-resolution annotation of gene functions is a central task in functional genomics. Multiple proteoforms translated from alternatively spliced isoforms from a single gene are actual function performers and greatly increase the functional diversity. The specific functions of different isoforms can decipher the molecular basis of various complex diseases at a finer granularity. Multi-instance learning (MIL)-based solutions have been developed to distribute gene(bag)-level Gene Ontology (GO) annotations to isoforms(instances), but they simply presume that a particular annotation of the gene is responsible by only one isoform, neglect the hierarchical structures and semantics of massive GO terms (labels), or can only handle dozens of terms.

Results: We propose an efficacy approach IsofunGO to differentiate massive functions of isoforms by GO embedding. Particularly, IsofunGO first introduces an attributed hierarchical network to model massive GO terms, and a GO network embedding strategy to learn compact representations of GO terms and project GO annotations of genes into compressed ones, this strategy not only explores and preserves hierarchy between GO terms but also greatly reduces the prediction load. Next, it develops an attention-based MIL network to fuse genomics and transcriptomics data of isoforms and predict isoform functions by referring to compressed annotations. Extensive experiments on benchmark datasets demonstrate the efficacy of IsofunGO. Both the GO embedding and attention mechanism can boost the performance and interpretability.

Availabilityand implementation: The code of IsofunGO is available at http://www.sdu-idea.cn/codes.php?name=IsofunGO.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology*
  • Gene Ontology
  • Molecular Sequence Annotation
  • Protein Isoforms / genetics
  • Semantics*

Substances

  • Protein Isoforms