Clustering clinical trials with similar eligibility criteria features

J Biomed Inform. 2014 Dec:52:112-20. doi: 10.1016/j.jbi.2014.01.009. Epub 2014 Feb 1.

Abstract

Objectives: To automatically identify and cluster clinical trials with similar eligibility features.

Methods: Using the public repository ClinicalTrials.gov as the data source, we extracted semantic features from the eligibility criteria text of all clinical trials and constructed a trial-feature matrix. We calculated the pairwise similarities for all clinical trials based on their eligibility features. For all trials, by selecting one trial as the center each time, we identified trials whose similarities to the central trial were greater than or equal to a predefined threshold and constructed center-based clusters. Then we identified unique trial sets with distinctive trial membership compositions from center-based clusters by disregarding their structural information.

Results: From the 145,745 clinical trials on ClinicalTrials.gov, we extracted 5,508,491 semantic features. Of these, 459,936 were unique and 160,951 were shared by at least one pair of trials. Crowdsourcing the cluster evaluation using Amazon Mechanical Turk (MTurk), we identified the optimal similarity threshold, 0.9. Using this threshold, we generated 8806 center-based clusters. Evaluation of a sample of the clusters by MTurk resulted in a mean score 4.331±0.796 on a scale of 1-5 (5 indicating "strongly agree that the trials in the cluster are similar").

Conclusions: We contribute an automated approach to clustering clinical trials with similar eligibility features. This approach can be potentially useful for investigating knowledge reuse patterns in clinical trial eligibility criteria designs and for improving clinical trial recruitment. We also contribute an effective crowdsourcing method for evaluating informatics interventions.

Keywords: Clinical trial; Cluster analysis; Medical informatics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Clinical Trials as Topic / classification*
  • Cluster Analysis*
  • Data Mining
  • Humans
  • Medical Informatics / methods*
  • Semantics*