Evaluation of word embedding models to extract and predict surgical data in breast cancer

Giuseppe Sgroi; Giulia Russo; Anna Maglia; Giuseppe Catanuto; Peter Barry; Andreas Karakatsanis; Nicola Rocco; ETHOS Working Group; Francesco Pappalardo

doi:10.1186/s12859-022-05038-6

Evaluation of word embedding models to extract and predict surgical data in breast cancer

BMC Bioinformatics. 2022 Nov 16;22(Suppl 14):631. doi: 10.1186/s12859-022-05038-6.

Authors

Giuseppe Sgroi¹, Giulia Russo², Anna Maglia³, Giuseppe Catanuto^{3

4}, Peter Barry³, Andreas Karakatsanis³, Nicola Rocco³; ETHOS Working Group; Francesco Pappalardo⁵

Affiliations

¹ Department of Mathematics and Computer Science, University of Catania, 95125, Catania, Italy.
² Department of Drug and Health Sciences, University of Catania, 95125, Catania, Italy.
³ G.RE.T.A. Group for Reconstructive and Therapeutic Advancements, Catania, Italy.
⁴ Multidisciplinary Breast Unit, Azienda Ospedaliera Cannizzaro, Catania, Italy.
⁵ Department of Drug and Health Sciences, University of Catania, 95125, Catania, Italy. francesco.pappalardo@unict.it.

Abstract

Background: Decisions in healthcare usually rely on the goodness and completeness of data that could be coupled with heuristics to improve the decision process itself. However, this is often an incomplete process. Structured interviews denominated Delphi surveys investigate experts' opinions and solve by consensus complex matters like those underlying surgical decision-making. Natural Language Processing (NLP) is a field of study that combines computer science, artificial intelligence, and linguistics. NLP can then be used as a valuable help in building a correct context in surgical data, contributing to the amelioration of surgical decision-making.

Results: We applied NLP coupled with machine learning approaches to predict the context (words) owning high accuracy from the words nearest to Delphi surveys, used as input.

Conclusions: The proposed methodology has increased the usefulness of Delphi surveys favoring the extraction of keywords that can represent a specific clinical context. It permits the characterization of the clinical context suggesting words for the evaluation process of the data.

Keywords: Breast cancer; Machine learning; Natural language processing; Word embeddings; Word2Vec.

MeSH terms

Artificial Intelligence*
Breast Neoplasms* / surgery
Female
Humans
Machine Learning
Natural Language Processing