Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest

Spectrochim Acta A Mol Biomol Spectrosc. 2015 Jan 25:135:185-91. doi: 10.1016/j.saa.2014.07.005. Epub 2014 Jul 10.

Abstract

Near-infrared (NIR) spectroscopy has such advantages as being noninvasive, fast, relatively inexpensive, and no risk of ionizing radiation. Differences in the NIR signals can reflect many physiological changes, which are in turn associated with such factors as vascularization, cellularity, oxygen consumption, or remodeling. NIR spectral differences between colorectal cancer and healthy tissues were investigated. A Fourier transform NIR spectroscopy instrument equipped with a fiber-optic probe was used to mimic in situ clinical measurements. A total of 186 spectra were collected and then underwent the preprocessing of standard normalize variate (SNV) for removing unwanted background variances. All the specimen and spots used for spectral collection were confirmed staining and examination by an experienced pathologist so as to ensure the representative of the pathology. Principal component analysis (PCA) was used to uncover the possible clustering. Several methods including random forest (RF), partial least squares-discriminant analysis (PLSDA), K-nearest neighbor and classification and regression tree (CART) were used to extract spectral features and to construct the diagnostic models. By comparison, it reveals that, even if no obvious difference of misclassified ratio (MCR) was observed between these models, RF is preferable since it is quicker, more convenient and insensitive to over-fitting. The results indicate that NIR spectroscopy coupled with RF model can serve as a potential tool for discriminating the colorectal cancer tissues from normal ones.

Keywords: Biodiagnostics; Chemometrics; Random forest; Spectrometry.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Algorithms*
  • Colorectal Neoplasms / diagnosis*
  • Decision Trees
  • Discriminant Analysis
  • Female
  • Humans
  • Least-Squares Analysis
  • Male
  • Middle Aged
  • Optical Fibers*
  • Principal Component Analysis
  • Spectroscopy, Near-Infrared / methods*