Labels in a haystack: Approaches beyond supervised learning in biomedical applications

Patterns (N Y). 2021 Dec 10;2(12):100383. doi: 10.1016/j.patter.2021.100383.

Abstract

Recent advances in biomedical machine learning demonstrate great potential for data-driven techniques in health care and biomedical research. However, this potential has thus far been hampered by both the scarcity of annotated data in the biomedical domain and the diversity of the domain's subfields. While unsupervised learning is capable of finding unknown patterns in the data by design, supervised learning requires human annotation to achieve the desired performance through training. With the latter performing vastly better than the former, the need for annotated datasets is high, but they are costly and laborious to obtain. This review explores a family of approaches existing between the supervised and the unsupervised problem setting. The goal of these algorithms is to make more efficient use of the available labeled data. The advantages and limitations of each approach are addressed and perspectives are provided.

Keywords: active learning; data annotation; data labeling; data value; machine learning; self-supervised learning; semi-supervised learning; zero-shot learning.

Publication types

  • Review