Classifying osteosarcoma patients using machine learning approaches

Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul:2017:82-85. doi: 10.1109/EMBC.2017.8036768.

Abstract

Metabolomic data analysis presents a unique opportunity to advance our understanding of osteosarcoma, a common bone malignancy for which genomic and proteomic studies have enjoyed limited success. One of the major goals of metabolomic studies is to classify osteosarcoma in early stages, which is required for metastasectomy treatment. In this paper we subject our metabolomic data on osteosarcoma patients collected by the SJTU team to three classification methods: logistic regression, support vector machine (SVM) and random forest (RF). The performances are evaluated and compared using receiver operating characteristic curves. All three classifiers are successful in distinguishing between healthy control and tumor cases, with random forest outperforming the other two for cross-validation in training set (accuracy rate for logistic regression, support vector machine and random forest are 88%, 90% and 97% respectively). Random forest achieved overall accuracy rate of 95% with 0.99 AUC on testing set.

MeSH terms

  • Humans
  • Logistic Models
  • Machine Learning
  • Osteosarcoma*
  • Proteomics
  • ROC Curve
  • Support Vector Machine