ChatGPT for phenotypes extraction: one model to rule them all?

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul:2023:1-4. doi: 10.1109/EMBC40787.2023.10340611.

Abstract

Information Extraction (IE) is a core task in Natural Language Processing (NLP) where the objective is to identify factual knowledge in textual documents (often unstructured), and feed downstream use cases with the resulting output. In genomic medicine for instance, being able to extract the most precise list of phenotypes associated to a patient allows to improve genetic disease diagnostic, which represents a vital step in the modern deep phenotyping approach. As most of the phenotypic information lies in clinical reports, the challenge is to build an IE pipeline to automatically recognize phenotype concepts from free-text notes. A new machine learning paradigm around large language models (LLM) has given rise of an increasing number of academic works on this topic lately, where sophisticated combinations of different technics have been employed to improve the phenotypes extraction accuracy. Even more recently released, the ChatGPT1 application nevertheless raises the question of the relevance of these approches compared to this new generic one based on an instruction-oriented LLM. In this paper, we propose a rigorous evaluation of ChatGPT and the current state-of-the-art solutions on this specific task, and discuss the possible impacts and the technical evolutions to consider in the medical domain.Clinical relevance- Deep phenotyping on electronic health records has proven its ability to improve genetic diagnosis by clinical exomes [10]. Thus, comparing state-of-the-art solutions in order to derive insights and improving research paths is essential.

MeSH terms

  • Electronic Health Records
  • Humans
  • Information Storage and Retrieval*
  • Language
  • Machine Learning*
  • Phenotype