Identifying Clinical Study Types from PubMed Metadata: The Active (Machine) Learning Approach

Adam G Dunn; Diana Arachi; Florence T Bourgeois

Identifying Clinical Study Types from PubMed Metadata: The Active (Machine) Learning Approach

Stud Health Technol Inform. 2015:216:867-71.

Authors

Adam G Dunn¹, Diana Arachi¹, Florence T Bourgeois²

Affiliations

¹ Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW, Australia.
² Children's Hospital Informatics Program, Boston Children's Hospital, Boston, MA, USA.

PMID: 26262175

Abstract

We examined a process for automating the classification of articles in MEDLINE aimed at minimising manual effort without sacrificing accuracy. From 22,808 articles pertaining to 19 antidepressants, 1000 were randomly selected and manually labelled according to article type (including, randomised controlled trials, editorials, etc.). We applied a machine learning approach termed 'active learning', where the learner (machine) selects the order in which the user (human) labels examples. Via simulation, we determined the number of articles a user needed to label to produce a classifier with at least 95% recall and 90% precision in three scenarios related to evidence synthesis. We found that the active learning process reduced the number of training instances required by 70%, 19%, and 14% in the three scenarios. The results show that the active learning method may be used in some scenarios to produce accurate classifiers that meet the needs of evidence synthesis tasks and reduce manual effort.

MeSH terms

Abstracting and Indexing / methods*
Clinical Studies as Topic
Humans
MEDLINE / statistics & numerical data
Machine Learning*
PubMed* / statistics & numerical data
Randomized Controlled Trials as Topic