Evaluating classification accuracy for modern learning approaches

Jialiang Li; Ming Gao; Ralph D'Agostino

doi:10.1002/sim.8103

Evaluating classification accuracy for modern learning approaches

Stat Med. 2019 Jun 15;38(13):2477-2503. doi: 10.1002/sim.8103. Epub 2019 Jan 30.

Authors

Jialiang Li^{1

2

3}, Ming Gao^{4

5}, Ralph D'Agostino⁶

Affiliations

¹ Department of Statistics and Applied Probability, National University of Singapore, Singapore.
² Duke University-NUS Graduate Medical School, Singapore.
³ Singapore Eye Research Institute, Singapore.
⁴ Department of Mathematics, Shanghai Jiao Tong University, Shanghai, China.
⁵ Department of Statistics, University of Michigan, Ann Arbor, Michigan.
⁶ Department of Mathematics and Statistics, Boston University, Boston, Massachusetts.

PMID: 30701585
DOI: 10.1002/sim.8103

Abstract

Deep learning neural network models such as multilayer perceptron (MLP) and convolutional neural network (CNN) are novel and attractive artificial intelligence computing tools. However, evaluation of the performance of these methods is not readily available for practitioners yet. We provide a tutorial for evaluating classification accuracy for various state-of-the-art learning approaches, including familiar shallow and deep learning methods. For qualitative response variables with more than two categories, many traditional accuracy measures such as sensitivity, specificity, and area under the receiver operating characteristic curve are not applicable and we have to consider their extensions properly. In this paper, a few important statistical concepts for multicategory classification accuracy are reviewed and their utilities for various learning algorithms are demonstrated with real medical examples. We offer problem-based R code to illustrate how to perform these statistical computations step by step. We expect that such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.

Keywords: R package; convolutional neural net; deep learning; multilayer perceptron; mxnet; neural network.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biopsy, Fine-Needle
Biostatistics / methods*
Breast Neoplasms / pathology
Decision Trees
Deep Learning*
Discriminant Analysis
Female
Humans
Leukemia / genetics
Logistic Models
Probability
Support Vector Machine