Association mapping in biomedical time series via statistically significant shapelet mining

Bioinformatics. 2018 Jul 1;34(13):i438-i446. doi: 10.1093/bioinformatics/bty246.

Abstract

Motivation: Most modern intensive care units record the physiological and vital signs of patients. These data can be used to extract signatures, commonly known as biomarkers, that help physicians understand the biological complexity of many syndromes. However, most biological biomarkers suffer from either poor predictive performance or weak explanatory power. Recent developments in time series classification focus on discovering shapelets, i.e. subsequences that are most predictive in terms of class membership. Shapelets have the advantage of combining a high predictive performance with an interpretable component-their shape. Currently, most shapelet discovery methods do not rely on statistical tests to verify the significance of individual shapelets. Therefore, identifying associations between the shapelets of physiological biomarkers and patients that exhibit certain phenotypes of interest enables the discovery and subsequent ranking of physiological signatures that are interpretable, statistically validated and accurate predictors of clinical endpoints.

Results: We present a novel and scalable method for scanning time series and identifying discriminative patterns that are statistically significant. The significance of a shapelet is evaluated while considering the problem of multiple hypothesis testing and mitigating it by efficiently pruning untestable shapelet candidates with Tarone's method. We demonstrate the utility of our method by discovering patterns in three of a patient's vital signs: heart rate, respiratory rate and systolic blood pressure that are indicators of the severity of a future sepsis event, i.e. an inflammatory response to an infective agent that can lead to organ failure and death, if not treated in time.

Availability and implementation: We make our method and the scripts that are required to reproduce the experiments publicly available at https://github.com/BorgwardtLab/S3M.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers*
  • Data Mining / methods*
  • Genetic Association Studies / methods*
  • Humans
  • Software*

Substances

  • Biomarkers