Classification of multiple cancer types by combination of plasma-based near-infrared spectroscopy analysis and machine learning modeling

Anal Biochem. 2023 May 15:669:115120. doi: 10.1016/j.ab.2023.115120. Epub 2023 Mar 24.

Abstract

Background and aim: Near-infrared spectroscopy (NIRS) is a non-invasive and convenient tool, which gains features related to chemical components in biological samples. Machine learning (ML) has been popularized in medical diagnosis. This study aimed at investigating a novel cancer diagnosis strategy using NIRS data based ML modeling.

Methods: Plasma samples were collected from a total of 247 participants, including lung cancer, cervical cancer, nasopharyngeal cancer, and healthy control, and were randomly split into train set and test set. After performing NIRS analysis, the train dataset was utilized to train ML models, including partial least-squares (PLS), random forest (RF), gradient boosting machine (GBM), and support-vector machine (SVM). Subsequently, these models were tested for their prediction performance by the test set.

Results: All ML models demonstrated high prediction performance in differentiating cancers from controls, and SVM had high prediction accuracy for different types of cancers. SVM was considered as the most suitable model for its minimal computational cost and high accuracies for both binary and quaternary classification.

Conclusions: This strategy coupling NIRS with ML is insightful that may aid in clinic cancer diagnosis, while further studies should test our results in a larger cohort with better representativeness.

Keywords: Cervical cancer; Diagnosis; Lung cancer; Machine learning; Nasopharyngeal cancer; Near-infrared spectroscopy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Female
  • Humans
  • Least-Squares Analysis
  • Machine Learning
  • Nasopharyngeal Neoplasms* / diagnosis
  • Spectroscopy, Near-Infrared / methods
  • Support Vector Machine
  • Uterine Cervical Neoplasms*