Artificial Intelligence in Plasma Cell Myeloma: Neural Networks and Support Vector Machines in the Classification of Plasma Cell Myeloma Data at Diagnosis

J Pathol Inform. 2021 Sep 16:12:35. doi: 10.4103/jpi.jpi_26_21. eCollection 2021.

Abstract

Background: Plasma cell neoplasm and/or plasma cell myeloma (PCM) is a mature B-cell lymphoproliferative neoplasm of plasma cells that secrete a single homogeneous immunoglobulin called paraprotein or M-protein. Plasma cells accumulate in the bone marrow (BM) leading to bone destruction and BM failure. Diagnosis of PCM is based on clinical, radiologic, and pathological characteristics. The percent of plasma cells by manual differential (bone marrow morphology), the white blood cell (WBC) count, cytogenetics, fluorescence in situ hybridization (FISH), microarray, and next-generation sequencing of BM are used in the risk stratification of newly diagnosed PCM patients. The genetics of PCM is highly complex and heterogeneous with several genetic subtypes that have different clinical outcomes. National Comprehensive Cancer Network guidelines recommend targeted FISH analysis of plasma cells with specific DNA probes to detect genetic abnormalities for the staging of PCM (4.2021). Recognition of risk categories through training software for classification of high-risk PCM and a novel way of addressing the current approaches through bioinformatics will be a significant step toward automation of PCM analysis.

Methods: A new artificial neural network (ANN) classification model was developed and tested in Python programming language with a first data set of 301 cases and a second data set of 176 cases for a total of 477 cases of PCM at diagnosis. Classification model was also developed with support vector machines (SVM) algorithm in R studio and interactive data visuals using Tableau.

Results: The resulting ANN algorithm had 94% accuracy for the first and second data sets with a classification summary of precision (PPV): 0.97, recall (sensitivity): 0.76, f1 score: 0.83, and accuracy of logistic regression of 1.0. SVM of plasma cells versus TP53 revealed a 95% accuracy level.

Conclusion: A novel classification model based only on specific morphological and genetic variables was developed using a machine learning algorithm, the ANN. ANN identified an association of WBC and BM plasma cell percentage with two of the high-risk genetic categories in the diagnostic cases of PCM. With further training and testing of additional data sets that include morphologic and additional genetic rearrangements, the newly developed ANN model has the potential to develop an accurate classification of high-risk categories of PCM.

Keywords: Artificial neural network; National Comprehensive Cancer Network; Next Generation Sequencing; cytogenetics; fluorescence in situ hybridization; machine learning; microarray; plasma cell myeloma; support vector machines kernel trick.