Accuracy of Wristband Fitbit Models in Assessing Sleep: Systematic Review and Meta-Analysis

J Med Internet Res. 2019 Nov 28;21(11):e16273. doi: 10.2196/16273.

Abstract

Background: Wearable sleep monitors are of high interest to consumers and researchers because of their ability to provide estimation of sleep patterns in free-living conditions in a cost-efficient way.

Objective: We conducted a systematic review of publications reporting on the performance of wristband Fitbit models in assessing sleep parameters and stages.

Methods: In adherence with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, we comprehensively searched the Cumulative Index to Nursing and Allied Health Literature (CINAHL), Cochrane, Embase, MEDLINE, PubMed, PsycINFO, and Web of Science databases using the keyword Fitbit to identify relevant publications meeting predefined inclusion and exclusion criteria.

Results: The search yielded 3085 candidate articles. After eliminating duplicates and in compliance with inclusion and exclusion criteria, 22 articles qualified for systematic review, with 8 providing quantitative data for meta-analysis. In reference to polysomnography (PSG), nonsleep-staging Fitbit models tended to overestimate total sleep time (TST; range from approximately 7 to 67 mins; effect size=-0.51, P<.001; heterogenicity: I2=8.8%, P=.36) and sleep efficiency (SE; range from approximately 2% to 15%; effect size=-0.74, P<.001; heterogenicity: I2=24.0%, P=.25), and underestimate wake after sleep onset (WASO; range from approximately 6 to 44 mins; effect size=0.60, P<.001; heterogenicity: I2=0%, P=.92) and there was no significant difference in sleep onset latency (SOL; P=.37; heterogenicity: I2=0%, P=.92). In reference to PSG, nonsleep-staging Fitbit models correctly identified sleep epochs with accuracy values between 0.81 and 0.91, sensitivity values between 0.87 and 0.99, and specificity values between 0.10 and 0.52. Recent-generation Fitbit models that collectively utilize heart rate variability and body movement to assess sleep stages performed better than early-generation nonsleep-staging ones that utilize only body movement. Sleep-staging Fitbit models, in comparison to PSG, showed no significant difference in measured values of WASO (P=.25; heterogenicity: I2=0%, P=.92), TST (P=.29; heterogenicity: I2=0%, P=.98), and SE (P=.19) but they underestimated SOL (P=.03; heterogenicity: I2=0%, P=.66). Sleep-staging Fitbit models showed higher sensitivity (0.95-0.96) and specificity (0.58-0.69) values in detecting sleep epochs than nonsleep-staging models and those reported in the literature for regular wrist actigraphy.

Conclusions: Sleep-staging Fitbit models showed promising performance, especially in differentiating wake from sleep. However, although these models are a convenient and economical means for consumers to obtain gross estimates of sleep parameters and time spent in sleep stages, they are of limited specificity and are not a substitute for PSG.

Keywords: Fitbit; accuracy; actigraphy; comparison of performance; polysomnography; sleep diary; sleep stages; sleep tracker; validation; wearable.

Publication types

  • Meta-Analysis
  • Systematic Review

MeSH terms

  • Actigraphy / methods*
  • Female
  • Humans
  • Male
  • Sleep / physiology*
  • Wrist