Two complementary AI approaches for predicting UMLS semantic group assignment: heuristic reasoning and deep learning

J Am Med Inform Assoc. 2023 Nov 17;30(12):1887-1894. doi: 10.1093/jamia/ocad152.

Abstract

Objective: Use heuristic, deep learning (DL), and hybrid AI methods to predict semantic group (SG) assignments for new UMLS Metathesaurus atoms, with target accuracy ≥95%.

Materials and methods: We used train-test datasets from successive 2020AA-2022AB UMLS Metathesaurus releases. Our heuristic "waterfall" approach employed a sequence of 7 different SG prediction methods. Atoms not qualifying for a method were passed on to the next method. The DL approach generated BioWordVec and SapBERT embeddings for atom names, BioWordVec embeddings for source vocabulary names, and BioWordVec embeddings for atom names of the second-to-top nodes of an atom's source hierarchy. We fed a concatenation of the 4 embeddings into a fully connected multilayer neural network with an output layer of 15 nodes (one for each SG). For both approaches, we developed methods to estimate the probability that their predicted SG for an atom would be correct. Based on these estimations, we developed 2 hybrid SG prediction methods combining the strengths of heuristic and DL methods.

Results: The heuristic waterfall approach accurately predicted 94.3% of SGs for 1 563 692 new unseen atoms. The DL accuracy on the same dataset was also 94.3%. The hybrid approaches achieved an average accuracy of 96.5%.

Conclusion: Our study demonstrated that AI methods can predict SG assignments for new UMLS atoms with sufficient accuracy to be potentially useful as an intermediate step in the time-consuming task of assigning new atoms to UMLS concepts. We showed that for SG prediction, combining heuristic methods and DL methods can produce better results than either alone.

Keywords: artificial intelligence; deep learning; heuristic reasoning; semantic network; unified medical language system.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Deep Learning*
  • Heuristics*
  • Neural Networks, Computer
  • Semantics
  • Unified Medical Language System

Associated data

  • Dryad/10.5061/dryad.dfn2z356z