Exploring Active Learning Based on Representativeness and Uncertainty for Biomedical Data Classification

IEEE J Biomed Health Inform. 2019 Nov;23(6):2238-2244. doi: 10.1109/JBHI.2018.2881155. Epub 2018 Nov 13.

Abstract

Nowadays, there is an abundance of biomedical data, such as images and genetic sequences, among others. However, there is a lack of annotation to such volume of data, due to the high costs involved to perform this task. Thus, it is mandatory to develop techniques to ease the burden of human annotation. To reach such goal active learning strategies can be applied. However, the state-of-the-art active learning methods, generally, are not feasible to lead with real-world datasets. Another important issue, that is generally neglected by these methods, is related to the conception that the classifier tends to learn more and more at each iteration. Their adopted selection criteria do not properly exploit the knowledge of the classifier. Therefore, in this paper, we propose the use of an active learning approach, in order to leverage the learning process, including the proposal of a novel active learning strategy. The main difference of our proposed strategy is related to the participation of the classifier in an extremely active way in its learning process. So, we can better maximize and prioritize the knowledge that is obtained by the classifier at each iteration, making use of this knowledge in a more appropriate and useful way when selecting more informative samples. To do so, in our selection criteria, we give significant importance to the classifications suggested by the classifier. In addition, jointly with the participation and the knowledge of the classifier, we consider both uncertainty and representativeness criteria through a fine-grained analysis of the samples. Experimental results show that our novel active learning approach outperforms state-of-the-art active learning methods, considering several supervised classifiers. Hence, dealing with real dataset problems in a better way, equalizing the tradeoff between annotation task and higher accuracy rates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Factual
  • Diagnosis, Computer-Assisted / methods*
  • Humans
  • Knowledge Discovery
  • Machine Learning*
  • Medical Informatics / methods*
  • Neoplasms / classification