Sparse Semiparametric Nonlinear Model with Application to Chromatographic Fingerprints

J Am Stat Assoc. 2014;109(508):1339-1349. doi: 10.1080/01621459.2013.836969.

Abstract

Traditional Chinese herbal medications (TCHMs) are comprised of a multitude of compounds and the identification of their active composition is an important area of research. Chromatography provides a visual representation of a TCHM sample's composition by outputting a curve characterized by spikes corresponding to compounds in the sample. Across different experimental conditions, the location of the spikes can be shifted, preventing direct comparison of curves and forcing compound identification to be possible only within each experiment. In this article we propose a sparse semiparametric nonlinear modeling framework for the establishment of a standardized chromatographic fingerprint. Data-driven basis expansion is used to model the common shape of the curves while a parametric time warping function registers across individual curves. Penalized weighted least squares with the adaptive lasso penalty provides a unified criterion for registration, model selection, and estimation. Furthermore, the adaptive lasso estimators possess attractive sampling properties. A back-fitting algorithm is proposed for estimation. Performance is assessed through simulation and we apply the model to chromatographic data of rhubarb collected from different experimental conditions and establish a standardized fingerprint as a first step in TCHM research.

Keywords: Adaptive lasso; Chromatography; Curve registration; Herbal medicine; Variable selection.