Evaluating predictive modeling algorithms to assess patient eligibility for clinical trials from routine data

BMC Med Inform Decis Mak. 2013 Dec 9:13:134. doi: 10.1186/1472-6947-13-134.

Abstract

Background: The necessity to translate eligibility criteria from free text into decision rules that are compatible with data from the electronic health record (EHR) constitutes the main challenge when developing and deploying clinical trial recruitment support systems. Recruitment decisions based on case-based reasoning, i.e. using past cases rather than explicit rules, could dispense with the need for translating eligibility criteria and could also be implemented largely independently from the terminology of the EHR's database. We evaluated the feasibility of predictive modeling to assess the eligibility of patients for clinical trials and report on a prototype's performance for different system configurations.

Methods: The prototype worked by using existing basic patient data of manually assessed eligible and ineligible patients to induce prediction models. Performance was measured retrospectively for three clinical trials by plotting receiver operating characteristic curves and comparing the area under the curve (ROC-AUC) for different prediction algorithms, different sizes of the learning set and different numbers and aggregation levels of the patient attributes.

Results: Random forests were generally among the best performing models with a maximum ROC-AUC of 0.81 (CI: 0.72-0.88) for trial A, 0.96 (CI: 0.95-0.97) for trial B and 0.99 (CI: 0.98-0.99) for trial C. The full potential of this algorithm was reached after learning from approximately 200 manually screened patients (eligible and ineligible). Neither block- nor category-level aggregation of diagnosis and procedure codes influenced the algorithms' performance substantially.

Conclusions: Our results indicate that predictive modeling is a feasible approach to support patient recruitment into clinical trials. Its major advantages over the commonly applied rule-based systems are its independency from the concrete representation of eligibility criteria and EHR data and its potential for automation.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Clinical Trials as Topic / standards*
  • Clinical Trials as Topic / statistics & numerical data
  • Electronic Health Records / standards*
  • Electronic Health Records / statistics & numerical data
  • Eligibility Determination / standards*
  • Eligibility Determination / statistics & numerical data
  • Feasibility Studies
  • Humans
  • Models, Theoretical*
  • Patient Selection*
  • Predictive Value of Tests