Disease phenotype synonymous prediction through network representation learning from PubMed database

Artif Intell Med. 2020 Jan:102:101745. doi: 10.1016/j.artmed.2019.101745. Epub 2019 Nov 19.

Abstract

Synonym mapping between phenotype concepts from different terminologies is difficult because terminology databases have been developed largely independently. Existing maps of synonymous phenotype concepts from different terminology databases are highly incomplete, and manually mapping is time consuming and laborious. Therefore, building an automatic method for predictive mapping of synonymous phenotypes is of special importance. We propose a classifier-based phenotype mapping prediction model (CPM) to predict synonymous relationships between phenotype concepts from different terminology databases. The model takes network semantic representations of phenotypes as input and predicts synonymous relationships by training binary classifiers with a voting strategy. We compared the performance of the CPM with a similarity-based phenotype mapping prediction model (SPM), which predicts mapping based on the ranked cosine similarity of candidate mapping concepts. Based on a network representation N2V-TFIDF, with a majority voting strategy method MV, the CPM achieved accuracy of 0.943, which was 15.4% higher than that of the SPM using the cosine similarity method (0.789) and 23.8% higher than that of the SSDTM method (0.724) proposed in our previous work.

Keywords: Classification; Network representation; Phenotype terminology; Synonyms relation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Automation
  • Computer Simulation
  • Disease* / classification
  • Humans
  • Machine Learning
  • Neural Networks, Computer*
  • Phenotype*
  • PubMed*
  • Reproducibility of Results