More efficient approximation of smoothing splines via space-filling basis selection

Biometrika. 2020 Sep;107(3):723-735. doi: 10.1093/biomet/asaa019. Epub 2020 May 7.

Abstract

We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size [Formula: see text], the smoothing spline estimator can be expressed as a linear combination of [Formula: see text] basis functions, requiring [Formula: see text] computational time when the number [Formula: see text] of predictors is two or more. Such a sizeable computational cost hinders the broad applicability of smoothing splines. In practice, the full-sample smoothing spline estimator can be approximated by an estimator based on [Formula: see text] randomly selected basis functions, resulting in a computational cost of [Formula: see text]. It is known that these two estimators converge at the same rate when [Formula: see text] is of order [Formula: see text], where [Formula: see text] depends on the true function and [Formula: see text] depends on the type of spline. Such a [Formula: see text] is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity. The asymptotic analysis shows that the proposed smoothing spline estimator can decrease [Formula: see text] to around [Formula: see text] when [Formula: see text]. Applications to synthetic and real-world datasets show that the proposed method leads to a smaller prediction error than other basis selection methods.

Keywords: Nonparametric regression; Penalized least squares criterion; Space-filling design; Star discrepancy; Subsampling.