Stability selection enables robust learning of differential equations from limited noisy data

Proc Math Phys Eng Sci. 2022 Jun;478(2262):20210916. doi: 10.1098/rspa.2021.0916. Epub 2022 Jun 15.

Abstract

We present a statistical learning framework for robust identification of differential equations from noisy spatio-temporal data. We address two issues that have so far limited the application of such methods, namely their robustness against noise and the need for manual parameter tuning, by proposing stability-based model selection to determine the level of regularization required for reproducible inference. This avoids manual parameter tuning and improves robustness against noise in the data. Our stability selection approach, termed PDE-STRIDE, can be combined with any sparsity-promoting regression method and provides an interpretable criterion for model component importance. We show that the particular combination of stability selection with the iterative hard-thresholding algorithm from compressed sensing provides a fast and robust framework for equation inference that outperforms previous approaches with respect to accuracy, amount of data required, and robustness. We illustrate the performance of PDE-STRIDE on a range of simulated benchmark problems, and we demonstrate the applicability of PDE-STRIDE on real-world data by considering purely data-driven inference of the protein interaction network for embryonic polarization in Caenorhabditis elegans. Using fluorescence microscopy images of C. elegans zygotes as input data, PDE-STRIDE is able to learn the molecular interactions of the proteins.

Keywords: PAR proteins; differential equations; machine learning; sparse regression; stability selection; statistical learning theory.

Associated data

  • figshare/10.6084/m9.figshare.c.6016866