Nunchaku: optimally partitioning data into piece-wise contiguous segments

Bioinformatics. 2023 Dec 1;39(12):btad688. doi: 10.1093/bioinformatics/btad688.

Abstract

Motivation: When analyzing 1D time series, scientists are often interested in identifying regions where one variable depends linearly on the other. Typically, they use an ad hoc and therefore often subjective method to do so.

Results: Here, we develop a statistically rigorous, Bayesian approach to infer the optimal partitioning of a dataset not only into contiguous piece-wise linear segments, but also into contiguous segments described by linear combinations of arbitrary basis functions. We therefore present a general solution to the problem of identifying discontinuous change points. Focusing on microbial growth, we use the algorithm to find the range of optical density where this density is linearly proportional to the number of cells and to automatically find the regions of exponential growth for both Escherichia coli and Saccharomyces cerevisiae. For budding yeast, we consequently are able to infer the Monod constant for growth on fructose. Our algorithm lends itself to automation and high throughput studies, increases reproducibility, and should facilitate data analyses for a broad range of scientists.

Availability and implementation: The corresponding Python package, entitled Nunchaku, is available at PyPI: https://pypi.org/project/nunchaku.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Reproducibility of Results
  • Saccharomyces cerevisiae
  • Software*