PhenoBERT: A Combined Deep Learning Method for Automated Recognition of Human Phenotype Ontology

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1269-1277. doi: 10.1109/TCBB.2022.3170301. Epub 2023 Apr 3.

Abstract

Automated recognition of Human Phenotype Ontology (HPO) terms from clinical texts is of significant interest to the field of clinical data mining. In this study, we develop a combined deep learning method named PhenoBERT for this purpose. PhenoBERT uses BERT, currently the state-of-the-art NLP model, as its core model for evaluating whether a clinically relevant text segment (CTS) could be represented by an HPO term. However, to avoid unnecessary comparison of a CTS with each of ∼14,000 HPO terms using BERT, we introduce a two-levels CNN module consisting of a series of CNN models organized at two levels in PhenoBERT. For a given CTS, the CNN module produces only a short list of candidate HPO terms for BERT to evaluate, significantly improving the computational efficiency. In addition, BERT is able to assign an ancestor HPO term to a CTS when recognition of the direct HPO term is not successful, mimicking the process of HPO term assignment by human. In two benchmarks, PhenoBERT outperforms four traditional dictionary-based methods and two recently developed deep learning-based methods in two benchmark tests, and its advantage is more obvious when the recognition task is more challenging. As such, PhenoBERT is of great use for assisting in the mining of clinical text data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • Data Mining
  • Deep Learning*
  • Family
  • Humans
  • Phenotype