Low-Dimensional Density Ratio Estimation for Covariate Shift Correction

Petar Stojanov; Mingming Gong; Jaime G Carbonell; Kun Zhang

Low-Dimensional Density Ratio Estimation for Covariate Shift Correction

Proc Mach Learn Res. 2019 Apr:89:3449-3458.

Authors

Petar Stojanov¹, Mingming Gong², Jaime G Carbonell³, Kun Zhang⁴

Affiliations

¹ Computer Science Department, Carnegie Mellon University.
² University of Pittsburgh, Carnegie Mellon University.
³ Language Technologies, Institute, Carnegie Mellon University.
⁴ Philosophy Department, Carnegie Mellon University.

PMID: 31497776
PMCID: PMC6730633

Abstract

Covariate shift is a prevalent setting for supervised learning in the wild when the training and test data are drawn from different time periods, different but related domains, or via different sampling strategies. This paper addresses a transfer learning setting, with covariate shift between source and target domains. Most existing methods for correcting covariate shift exploit density ratios of the features to reweight the source-domain data, and when the features are high-dimensional, the estimated density ratios may suffer large estimation variances, leading to poor prediction performance. In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target Y, and exploits the density ratio of this representation for importance reweighting. We discuss the factors affecting the performance of our method and demonstrate its capabilities on both pseudo-real and real-world data.

Grants and funding

U54 HG008540/HG/NHGRI NIH HHS/United States