Compressed computations using wavelets for hidden Markov models with continuous observations

Luca Bello; John Wiedenhöft; Alexander Schliep

doi:10.1371/journal.pone.0286074

Compressed computations using wavelets for hidden Markov models with continuous observations

PLoS One. 2023 Jun 6;18(6):e0286074. doi: 10.1371/journal.pone.0286074. eCollection 2023.

Authors

Luca Bello¹, John Wiedenhöft², Alexander Schliep^{1

3}

Affiliations

¹ Computer Science and Engineering, University of Gothenburg, Chalmers, Gothenburg, Sweden.
² Scientific Core Facility Medical Biometry and Statistical Bioinformatics, University Medical Center Göttingen, Göttingen, Germany.
³ Faculty of Health Sciences, B-TU Cottbus-Senftenberg, Cottbus, Germany.

Abstract

Compression as an accelerant of computation is increasingly recognized as an important component in engineering fast real-world machine learning methods for big data; c.f., its impact on genome-scale approximate string matching. Previous work showed that compression can accelerate algorithms for Hidden Markov Models (HMM) with discrete observations, both for the classical frequentist HMM algorithms-Forward Filtering, Backward Smoothing and Viterbi-and Gibbs sampling for Bayesian HMM. For Bayesian HMM with continuous-valued observations, compression was shown to greatly accelerate computations for specific types of data. For instance, data from large-scale experiments interrogating structural genetic variation can be assumed to be piece-wise constant with noise, or, equivalently, data generated by HMM with dominant self-transition probabilities. Here we extend the compressive computation approach to the classical frequentist HMM algorithms on continuous-valued observations, providing the first compressive approach for this problem. In a large-scale simulation study, we demonstrate empirically that in many settings compressed HMM algorithms very clearly outperform the classical algorithms with no, or only an insignificant effect, on the computed probabilities and infered state paths of maximal likelihood. This provides an efficient approach to big data computations with HMM. An open-source implementation of the method is available from https://github.com/lucabello/wavelet-hmms.

Copyright: © 2023 Bello et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms*
Bayes Theorem
Computer Simulation
Markov Chains
Probability

Grants and funding

The author(s) received no specific funding for this work.