Modeling non-linear relationships in epidemiological data: The application and interpretation of spline models

Noah A Schuster; Judith J M Rijnhart; Jos W R Twisk; Martijn W Heymans

doi:10.3389/fepid.2022.975380

Modeling non-linear relationships in epidemiological data: The application and interpretation of spline models

Front Epidemiol. 2022 Aug 18:2:975380. doi: 10.3389/fepid.2022.975380. eCollection 2022.

Authors

Noah A Schuster^{1

2}, Judith J M Rijnhart^{1

2}, Jos W R Twisk^{1

2}, Martijn W Heymans^{1

2}

Affiliations

¹ Amsterdam UMC location Vrije Universiteit Amsterdam, Epidemiology and Data Science, Amsterdam, Netherlands.
² Amsterdam Public Health, Methodology, Amsterdam, Netherlands.

Abstract

Objective: Traditional methods to deal with non-linearity in regression analysis often result in loss of information or compromised interpretability of the results. A recommended but underutilized method for modeling non-linear associations in regression models is spline functions. We explain spline functions in a non-mathematical way and illustrate the application and interpretation to an empirical data example.

Methods: Using data from the Amsterdam Growth and Health Longitudinal Study, we examined the non-linear relationship between the sum of four skinfolds and VO₂max, which are measures of body fat and cardiorespiratory fitness, respectively. We compared traditional methods (i.e., quadratic regression and categorization) to spline methods [1- and 3-knot linear spline (LSP) models and a 3-knot restricted cubic spline (RCS) model] in terms of the interpretability of the results and their explained variance ( $r_{a d j}^{2}$ ).

Results: The spline models fitted the data better than the traditional methods. Increasing the number of knots in the LSP model increased the explained variance (from $r_{a d j}^{2} = 0.578$ for the 1-knot model to $r_{a d j}^{2} = 0.582$ for the 3-knot model). The RCS model fitted the data best ( $r_{a d j}^{2} = 0.591$ ), but results in regression coefficients that are harder to interpret.

Conclusion: Spline functions should be considered more often as they are flexible and can be applied in commonly used regression analysis. RCS regression is generally recommended for prediction research (i.e., to obtain the predicted outcome for a specific exposure value), whereas LSP regression is recommended if one is interested in the effects in a population.

Keywords: epidemiological methods; linear splines; non-linearity; regression analysis; restricted cubic splines; spline functions.