An accurate probabilistic step finder for time-series analysis

bioRxiv [Preprint]. 2023 Sep 22:2023.09.19.558535. doi: 10.1101/2023.09.19.558535.

Abstract

Noisy time-series data is commonly collected from sources including Förster Resonance Energy Transfer experiments, patch clamp and force spectroscopy setups, among many others. Two of the most common paradigms for the detection of discrete transitions in such time-series data include: hidden Markov models (HMMs) and step-finding algorithms. HMMs, including their extensions to infinite state-spaces, inherently assume in analysis that holding times in discrete states visited are geometrically-or, loosely speaking in common language, exponentially-distributed. Thus the determination of step locations, especially in sparse and noisy data, is biased by HMMs toward identifying steps resulting in geometric holding times. In contrast, existing step-finding algorithms, while free of this restraint, often rely on ad hoc metrics to penalize steps recovered in time traces (by using various information criteria) and otherwise rely on approximate greedy algorithms to identify putative global optima. Here, instead, we devise a robust and general probabilistic (Bayesian) step-finding tool that neither relies on ad hoc metrics to penalize step numbers nor assumes geometric holding times in each state. As the number of steps themselves in a time-series are, a priori unknown, we treat these within a Bayesian nonparametric (BNP) paradigm. We find that the method developed, Bayesian Nonparametric Step (BNP-Step), accurately determines the number and location of transitions between discrete states without any assumed kinetic model and learns the emission distribution characteristic of each state. In doing so, we verify that BNP-Step can analyze sparser data sets containing higher noise and more closely-spaced states than otherwise resolved by current state-of-the-art methods. What is more, BNP-Step rigorously propagates measurement uncertainty into uncertainty over state transition locations, numbers, and emission levels as characterized by the posterior. We demonstrate the performance of BNP-Step on both synthetic data as well as data drawn from force spectroscopy experiments.

Publication types

  • Preprint