A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering

Methods Inf Med. 2017 May 18;56(3):209-216. doi: 10.3414/ME16-01-0116. Epub 2017 Mar 31.

Abstract

Background and objective: Biomedical question type classification is one of the important components of an automatic biomedical question answering system. The performance of the latter depends directly on the performance of its biomedical question type classification system, which consists of assigning a category to each question in order to determine the appropriate answer extraction algorithm. This study aims to automatically classify biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3) list, and (4) summary.

Methods: In this paper, we propose a biomedical question type classification method based on machine learning approaches to automatically assign a category to a biomedical question. First, we extract features from biomedical questions using the proposed handcrafted lexico-syntactic patterns. Then, we feed these features for machine-learning algorithms. Finally, the class label is predicted using the trained classifiers.

Results: Experimental evaluations performed on large standard annotated datasets of biomedical questions, provided by the BioASQ challenge, demonstrated that our method exhibits significant improved performance when compared to four baseline systems. The proposed method achieves a roughly 10-point increase over the best baseline in terms of accuracy. Moreover, the obtained results show that using handcrafted lexico-syntactic patterns as features' provider of support vector machine (SVM) lead to the highest accuracy of 89.40 %.

Conclusion: The proposed method can automatically classify BioASQ questions into one of the four categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated that our method produced the best classification performance compared to four baseline systems.

Keywords: Biomedical question answering; biomedical informatics; biomedical question classification; information retrieval; natural language processing.

Publication types

  • Equivalence Trial
  • Validation Study

MeSH terms

  • Biological Ontologies*
  • Information Storage and Retrieval / methods*
  • Machine Learning*
  • Natural Language Processing*
  • Pattern Recognition, Automated / methods
  • Semantics*
  • Surveys and Questionnaires / classification*