Predictive analysis for identifying potentially undiagnosed post-stroke spasticity patients in United Kingdom

J Biomed Inform. 2016 Apr:60:328-33. doi: 10.1016/j.jbi.2016.02.012. Epub 2016 Feb 27.

Abstract

Purpose of the research: Spasticity is one of the well-recognized complications of stroke which may give rise to pain and limit patients' ability to perform daily activities. The predisposing factors and direct effects of post-stroke spasticity also involve high management costs in terms of healthcare resources, and case-control designs are required for establishing such differences. Using 'The Health Improvement Network' (THIN) database, such a study would not provide reliable estimates since the prevalence of post-stroke spasticity was found to be 2%, substantially below the most conservative previously reported estimates. The objective of this study was to use predictive analysis techniques to determine if there are a substantial number of potentially under-recorded patients with post-stroke spasticity.

Methods: This study used retrospective data from adult patients with a diagnostic code for stroke between 2007 and 2011 registered in THIN. Two algorithm approaches were developed and compared, a statistically validated data-trained algorithm and a clinician-trained algorithm.

Results: A data-trained algorithm using Random Forest showed better prediction performance than clinician-trained algorithm, with higher sensitivity and only marginally lower specificity. Overall accuracy was 75% and 72%, respectively. The data-trained algorithm predicted an additional 3912 records consistent with patients developing spasticity in the 12months following a stroke.

Conclusions: Using machine learning techniques, additional unrecorded post-stroke spasticity patients were identified, increasing the condition's prevalence in THIN from 2% to 13%. This work shows the potential for under-reporting of PSS in primary care data, and provides a method for improved identification of cases and control records for future studies.

Keywords: Electronic medical records; Machine learning; Random forest; Spasticity; Stroke.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • False Positive Reactions
  • Humans
  • Machine Learning
  • Muscle Spasticity / complications
  • Muscle Spasticity / diagnosis*
  • Muscle Spasticity / epidemiology
  • Prevalence
  • Regression Analysis
  • Reproducibility of Results
  • Retrospective Studies
  • Sensitivity and Specificity
  • Stroke / complications
  • Stroke / diagnosis*
  • Stroke / epidemiology
  • United Kingdom