Training an automated circulating tumor cell classifier when the true classification is uncertain

PNAS Nexus. 2024 Feb 9;3(2):pgae048. doi: 10.1093/pnasnexus/pgae048. eCollection 2024 Feb.

Abstract

Circulating tumor cell (CTC) and tumor-derived extracellular vesicle (tdEV) loads are prognostic factors of survival in patients with carcinoma. The current method of CTC enumeration relies on operator review and, unfortunately, has moderate interoperator agreement (Fleiss' kappa 0.60) due to difficulties in classifying CTC-like events. We compared operator review, ACCEPT automated image processing, and refined the output of a deep-learning algorithm to identify CTC and tdEV for the prediction of survival in patients with metastatic and nonmetastatic cancers. Operator review is only defined for CTC. Refinement was performed using automatic contrast maximization CM-CTC of events detected in cancer and in benign samples (CM-CTC). We used 418 samples from benign diseases, 6,293 from nonmetastatic breast, 2,408 from metastatic breast, and 698 from metastatic prostate cancer to train, test, optimize, and evaluate CTC and tdEV enumeration. For CTC identification, the CM-CTC performed best on metastatic/nonmetastatic breast cancer, respectively, with a hazard ratio (HR) for overall survival of 2.6/2.1 vs. 2.4/1.4 for operator CTC and 1.2/0.8 for ACCEPT-CTC. For tdEV identification, CM-tdEV performed best with an HR of 1.6/2.9 vs. 1.5/1.0 with ACCEPT-tdEV. In conclusion, contrast maximization is effective even though it does not utilize domain knowledge.

Keywords: automated classifier; circulating tumor cell; label uncertainty; prognostic power; tumor-derived extracellular vesicle.