Automated information extraction from free-text EEG reports

Annu Int Conf IEEE Eng Med Biol Soc. 2015:2015:6804-7. doi: 10.1109/EMBC.2015.7319956.

Abstract

In this study we have developed a supervised learning to automatically detect with high accuracy EEG reports that describe seizures and epileptiform discharges. We manually labeled 3,277 documents as describing one or more seizures vs no seizures, and as describing epileptiform discharges vs no epileptiform discharges. We then used Naïve Bayes to develop a system able to automatically classify EEG reports into these categories. Our system consisted of normalization techniques, extraction of key sentences, and automated feature selection using cross validation. As candidate features we used key words and special word patterns called elastic word sequences (EWS). Final feature selection was accomplished via sequential backward selection. We used cross validation to predict out of sample performance. Our automated feature selection procedure resulted in a classifier with 38 features for seizure detection, and 23 features for epileptiform discharge detection. The average [95% CI] area under the receiver operating curve was 99.05 [98.79, 99.32]% for detecting reports with seizures, and 96.15 [92.31, 100.00]% for detecting reports with epileptiform discharges. The methodology described herein greatly reduces the manual labor involved in identifying large cohorts of patients for retrospective neurophysiological studies of patients with epilepsy.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Diagnosis, Computer-Assisted
  • Electroencephalography / methods*
  • Epilepsy / diagnosis*
  • Humans
  • Machine Learning
  • ROC Curve
  • Retrospective Studies