Text Classification of Cancer Clinical Trial Eligibility Criteria

Yumeng Yang; Soumya Jayaraj; Ethan Ludmir; Kirk Roberts

Text Classification of Cancer Clinical Trial Eligibility Criteria

AMIA Annu Symp Proc. 2024 Jan 11:2023:1304-1313. eCollection 2023.

Authors

Yumeng Yang¹, Soumya Jayaraj¹, Ethan Ludmir^{2

3}, Kirk Roberts¹

Affiliations

¹ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
² Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
³ Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.

PMID: 38222417
PMCID: PMC10785908

Abstract

Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility are stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yield the highest average performance across all criteria.

MeSH terms

Eligibility Determination / methods
Humans
Language
Natural Language Processing
Neoplasms*