Identifying scientific artefacts in biomedical literature: the Evidence Based Medicine use case

J Biomed Inform. 2014 Jun:49:159-70. doi: 10.1016/j.jbi.2014.02.006. Epub 2014 Feb 14.

Abstract

Evidence Based Medicine (EBM) provides a framework that makes use of the current best evidence in the domain to support clinicians in the decision making process. In most cases, the underlying foundational knowledge is captured in scientific publications that detail specific clinical studies or randomised controlled trials. Over the course of the last two decades, research has been performed on modelling key aspects described within publications (e.g., aims, methods, results), to enable the successful realisation of the goals of EBM. A significant outcome of this research has been the PICO (Population/Problem-Intervention-Comparison-Outcome) structure, and its refined version PIBOSO (Population-Intervention-Background-Outcome-Study Design-Other), both of which provide a formalisation of these scientific artefacts. Subsequently, using these schemes, diverse automatic extraction techniques have been proposed to streamline the knowledge discovery and exploration process in EBM. In this paper, we present a Machine Learning approach that aims to classify sentences according to the PIBOSO scheme. We use a discriminative set of features that do not rely on any external resources to achieve results comparable to the state of the art. A corpus of 1000 structured and unstructured abstracts - i.e., the NICTA-PIBOSO corpus - is used for training and testing. Our best CRF classifier achieves a micro-average F-score of 90.74% and 87.21%, respectively, over structured and unstructured abstracts, which represents an increase of 25.48 percentage points and 26.6 percentage points in F-score when compared to the best existing approaches.

Keywords: Evidence Based Medicine; Machine Learning; PIBOSO; PICO; Text classification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artifacts*
  • Evidence-Based Medicine*
  • Publishing*