Raman spectroscopy (RS), a non-invasive and label-free method, has been suggested to improve accuracy of cytological and even histopathological diagnosis. To our knowledge, this novel technique tends to be employed without concrete knowledge of molecular changes in cells. Therefore, identification of Raman spectral markers for objective diagnosis is necessary for universal adoption of RS. As a model study, we investigated human mammary epithelial cells (HMEpC) and breast cancer cells (MCF-7) by RS and employed various multivariate analyses (MA) including principal components analysis (PCA), linear discriminant analysis (LDA), and support vector machine (SVM) to estimate diagnostic accuracy. Furthermore, to elucidate the underlying molecular changes in cancer cells, we utilized multivariate curve resolution analysis-alternating least squares (MCR-ALS) with non-negative constraints to extract physically meaningful spectra from complex cellular data. Unsupervised PCA and supervised MA, such as LDA and SVM, classified HMEpC and MCF-7 fairly well with high accuracy but without revealing molecular basis. Employing MCR-ALS analysis we identified five pure biomolecular spectra comprising DNA, proteins and three independent unsaturated lipid components. Relative abundance of lipid 1 seems to be strictly regulated between the two groups of cells and could be the basis for excellent discrimination by chemometrics-assisted RS. It was unambiguously assigned to linoleate rich glyceride and therefore serves as a Raman spectral marker for reliable diagnosis. This study successfully identified Raman spectral markers and demonstrated the potential of RS to become an excellent cytodiagnostic tool that can both accurately and objectively discriminates breast cancer from normal cells.
Keywords: MCR-ALS; PUFA; Raman spectroscopy; breast cancer; cancer diagnosis; chemometrics; cpectral marker; cytodiagnosis; linoleic acid; lipid metabolism.