Using Copula distributions to support more accurate imaging-based diagnostic classifiers for neuropsychiatric disorders

Ravi Bansal; Xuejun Hao; Jun Liu; Bradley S Peterson

doi:10.1016/j.mri.2014.07.011

Using Copula distributions to support more accurate imaging-based diagnostic classifiers for neuropsychiatric disorders

Magn Reson Imaging. 2014 Nov;32(9):1102-13. doi: 10.1016/j.mri.2014.07.011. Epub 2014 Aug 2.

Authors

Ravi Bansal¹, Xuejun Hao², Jun Liu², Bradley S Peterson²

Affiliations

¹ Department of Psychiatry, Columbia College of Physicians & Surgeons, New York, NY 10032. Electronic address: bansalr@nyspi.columbia.edu.
² Department of Psychiatry, Columbia College of Physicians & Surgeons, New York, NY 10032.

Abstract

Many investigators have tried to apply machine learning techniques to magnetic resonance images (MRIs) of the brain in order to diagnose neuropsychiatric disorders. Usually the number of brain imaging measures (such as measures of cortical thickness and measures of local surface morphology) derived from the MRIs (i.e., their dimensionality) has been large (e.g. >10) relative to the number of participants who provide the MRI data (<100). Sparse data in a high dimensional space increase the variability of the classification rules that machine learning algorithms generate, thereby limiting the validity, reproducibility, and generalizability of those classifiers. The accuracy and stability of the classifiers can improve significantly if the multivariate distributions of the imaging measures can be estimated accurately. To accurately estimate the multivariate distributions using sparse data, we propose to estimate first the univariate distributions of imaging data and then combine them using a Copula to generate more accurate estimates of their multivariate distributions. We then sample the estimated Copula distributions to generate dense sets of imaging measures and use those measures to train classifiers. We hypothesize that the dense sets of brain imaging measures will generate classifiers that are stable to variations in brain imaging measures, thereby improving the reproducibility, validity, and generalizability of diagnostic classification algorithms in imaging datasets from clinical populations. In our experiments, we used both computer-generated and real-world brain imaging datasets to assess the accuracy of multivariate Copula distributions in estimating the corresponding multivariate distributions of real-world imaging data. Our experiments showed that diagnostic classifiers generated using imaging measures sampled from the Copula were significantly more accurate and more reproducible than were the classifiers generated using either the real-world imaging measures or their multivariate Gaussian distributions. Thus, our findings demonstrate that estimated multivariate Copula distributions can generate dense sets of brain imaging measures that can in turn be used to train classifiers, and those classifiers are significantly more accurate and more reproducible than are those generated using real-world imaging measures alone.

Keywords: Machine learning; Magnetic resonance imaging; Neuropsychiatric disorders; Support vector machines.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Adult
Algorithms
Brain / pathology*
Brain Mapping / methods*
Female
Humans
Image Interpretation, Computer-Assisted / methods*
Magnetic Resonance Imaging / methods*
Male
Mental Disorders / diagnosis*
Normal Distribution
ROC Curve
Reproducibility of Results
Support Vector Machine*

Abstract

Publication types

MeSH terms

Grants and funding