Classification using ensemble learning under weighted misclassification loss

Yizhen Xu; Tao Liu; Michael J Daniels; Rami Kantor; Ann Mwangi; Joseph W Hogan

doi:10.1002/sim.8082

Classification using ensemble learning under weighted misclassification loss

Stat Med. 2019 May 20;38(11):2002-2012. doi: 10.1002/sim.8082. Epub 2019 Jan 4.

Authors

Yizhen Xu¹, Tao Liu¹, Michael J Daniels², Rami Kantor³, Ann Mwangi^{4

5}, Joseph W Hogan^{1

4}

Affiliations

¹ Department of Biostatistics, Brown University, Providence, RI.
² Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX.
³ Division of Infectious Diseases, Brown University, Providence, RI.
⁴ Academic Model Providing Access to Healthcare (AMPATH), Eldoret, Kenya.
⁵ College of Health Sciences, School of Medicine, Eldoret, Kenya.

Abstract

Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives, which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk. We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.

Keywords: HIV virological failure; classification; ensemble learning; weighted misclassification loss.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Biomarkers / analysis*
Breast Neoplasms
CD4 Lymphocyte Count
HIV Infections
Humans
Models, Statistical*
Reproducibility of Results
Treatment Failure
Viral Load / classification

Substances

Biomarkers

Abstract

Publication types

MeSH terms

Substances

Grants and funding