Recognizing scientific artifacts in biomedical literature

Biomed Inform Insights. 2013 Apr 2:6:15-27. doi: 10.4137/BII.S11572. Print 2013.

Abstract

Today's search engines and digital libraries offer little or no support for discovering those scientific artifacts (hypotheses, supporting/contradicting statements, or findings) that form the core of scientific written communication. Consequently, we currently have no means of identifying central themes within a domain or to detect gaps between accepted knowledge and newly emerging knowledge as a means for tracking the evolution of hypotheses from incipient phases to maturity or decline. We present a hybrid Machine Learning approach using an ensemble of four classifiers, for recognizing scientific artifacts (ie, hypotheses, background, motivation, objectives, and findings) within biomedical research publications, as a precursory step to the general goal of automatically creating argumentative discourse networks that span across multiple publications. The performance achieved by the classifiers ranges from 15.30% to 78.39%, subject to the target class. The set of features used for classification has led to promising results. Furthermore, their use strictly in a local, publication scope, ie, without aggregating corpus-wide statistics, increases the versatility of the ensemble of classifiers and enables its direct applicability without the necessity of re-training.

Keywords: conceptualization zones; information extraction; scientific artifacts.