Sufficient dimension reduction with additional information

Hung Hung; Chih-Yen Liu; Henry Horng-Shing Lu

doi:10.1093/biostatistics/kxv051

Sufficient dimension reduction with additional information

Biostatistics. 2016 Jul;17(3):405-21. doi: 10.1093/biostatistics/kxv051. Epub 2015 Dec 24.

Authors

Hung Hung¹, Chih-Yen Liu², Henry Horng-Shing Lu³

Affiliations

¹ Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan hhung@ntu.edu.tw.
² Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan.
³ Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan.

PMID: 26704765
DOI: 10.1093/biostatistics/kxv051

Abstract

Sufficient dimension reduction is widely applied to help model building between the response [Formula: see text] and covariate [Formula: see text] In some situations, we also collect additional covariate [Formula: see text] that has better performance in predicting [Formula: see text], but has a higher obtaining cost, than [Formula: see text] While constructing a predictive model for [Formula: see text] based on [Formula: see text] is straightforward, this strategy is not applicable since [Formula: see text] is not available for future observations in which the constructed model is to be applied. As a result, the aim of the study is to build a predictive model for [Formula: see text] based on [Formula: see text] only, where the available data is [Formula: see text] A naive method is to conduct analysis using [Formula: see text] directly, but ignoring [Formula: see text] can cause the problem of inefficiency. On the other hand, it is not trivial to utilize the information of [Formula: see text] to infer [Formula: see text], either. In this article, we propose a two-stage dimension reduction method for [Formula: see text] that is able to utilize the information of [Formula: see text] In the breast cancer data, the risk score constructed from the two-stage method can well separate patients with different survival experiences. In the Pima data, the two-stage method requires fewer components to infer the diabetes status, while achieving higher classification accuracy than the conventional method.

Keywords: Additional information; Efficiency; Envelopes; Sufficient dimension reduction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Arizona / ethnology
Breast Neoplasms / diagnosis
Breast Neoplasms / epidemiology
Data Interpretation, Statistical*
Diabetes Mellitus / diagnosis
Diabetes Mellitus / ethnology
Female
Humans
Indians, North American / ethnology
Models, Theoretical*
Risk Assessment / methods*