A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19

Artif Intell Med. 2024 Feb:148:102750. doi: 10.1016/j.artmed.2023.102750. Epub 2023 Dec 20.

Abstract

Computational subphenotyping, a data-driven approach to understanding disease subtypes, is a prominent topic in medical research. Numerous ongoing studies are dedicated to developing advanced computational subphenotyping methods for cross-sectional data. However, the potential of time-series data has been underexplored until now. Here, we propose a Multivariate Levenshtein Distance (MLD) that can account for address correlation in multiple discrete features over time-series data. Our algorithm has two distinct components: it integrates an optimal threshold score to enhance the sensitivity in discriminating between pairs of instances, and the MLD itself. We have applied the proposed distance metrics on the k-means clustering algorithm to derive temporal subphenotypes from time-series data of biomarkers and treatment administrations from 1039 critically ill patients with COVID-19 and compare its effectiveness to standard methods. In conclusion, the Multivariate Levenshtein Distance metric is a novel method to quantify the distance from multiple discrete features over time-series data and demonstrates superior clustering performance among competing time-series distance metrics.

Keywords: Covid-19; Electronic health records; Time-series distance metrics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • COVID-19*
  • Critical Illness*
  • Cross-Sectional Studies
  • Humans
  • Time Factors