METHODS FOR CLUSTERING TIME SERIES DATA ACQUIRED FROM MOBILE HEALTH APPS

Nicole Tignor; Pei Wang; Nicholas Genes; Linda Rogers; Steven G Hershman; Erick R Scott; Micol Zweig; Yu-Feng Yvonne Chan; Eric E Schadt

doi:10.1142/9789813207813_0029

METHODS FOR CLUSTERING TIME SERIES DATA ACQUIRED FROM MOBILE HEALTH APPS

Pac Symp Biocomput. 2017:22:300-311. doi: 10.1142/9789813207813_0029.

Authors

Nicole Tignor¹, Pei Wang, Nicholas Genes, Linda Rogers, Steven G Hershman, Erick R Scott, Micol Zweig, Yu-Feng Yvonne Chan, Eric E Schadt

Affiliation

¹ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

PMID: 27896984
DOI: 10.1142/9789813207813_0029

Abstract

In our recent Asthma Mobile Health Study (AMHS), thousands of asthma patients across the country contributed medical data through the iPhone Asthma Health App on a daily basis for an extended period of time. The collected data included daily self-reported asthma symptoms, symptom triggers, and real time geographic location information. The AMHS is just one of many studies occurring in the context of now many thousands of mobile health apps aimed at improving wellness and better managing chronic disease conditions, leveraging the passive and active collection of data from mobile, handheld smart devices. The ability to identify patient groups or patterns of symptoms that might predict adverse outcomes such as asthma exacerbations or hospitalizations from these types of large, prospectively collected data sets, would be of significant general interest. However, conventional clustering methods cannot be applied to these types of longitudinally collected data, especially survey data actively collected from app users, given heterogeneous patterns of missing values due to: 1) varying survey response rates among different users, 2) varying survey response rates over time of each user, and 3) non-overlapping periods of enrollment among different users. To handle such complicated missing data structure, we proposed a probability imputation model to infer missing data. We also employed a consensus clustering strategy in tandem with the multiple imputation procedure. Through simulation studies under a range of scenarios reflecting real data conditions, we identified favorable performance of the proposed method over other strategies that impute the missing value through low-rank matrix completion. When applying the proposed new method to study asthma triggers and symptoms collected as part of the AMHS, we identified several patient groups with distinct phenotype patterns. Further validation of the methods described in this paper might be used to identify clinically important patterns in large data sets with complicated missing data structure, improving the ability to use such data sets to identify at-risk populations for potential intervention.

MeSH terms

Asthma / classification
Asthma / diagnosis
Asthma / therapy
Cell Phone
Cluster Analysis
Computational Biology / methods
Computer Simulation
Data Collection
Humans
Mobile Applications*
Surveys and Questionnaires
Telemedicine*
Time Factors