Training an automated circulating tumor cell classifier when the true classification is uncertain

Afroditi Nanou; Nikolas H Stoecklein; Daniel Doerr; Christiane Driemel; Leon W M M Terstappen; Frank A W Coumans

doi:10.1093/pnasnexus/pgae048

Training an automated circulating tumor cell classifier when the true classification is uncertain

PNAS Nexus. 2024 Feb 9;3(2):pgae048. doi: 10.1093/pnasnexus/pgae048. eCollection 2024 Feb.

Authors

Afroditi Nanou¹, Nikolas H Stoecklein², Daniel Doerr³, Christiane Driemel², Leon W M M Terstappen^{1

4}, Frank A W Coumans^{1

4}

Affiliations

¹ Department of Medical Cell BioPhysics, Faculty of Science and Technology, University of Twente, Enschede 7522 NH, The Netherlands.
² Department of General, Visceral and Pediatric Surgery, Heinrich-Heine University, University Hospital Düsseldorf, Düsseldorf 40225, Germany.
³ Institute for Medical Biometry and Bioinformatics, Heinrich Heine University, Düsseldorf, Germany.
⁴ Decisive Science, Amsterdam 1019 BB, The Netherlands.

Abstract

Circulating tumor cell (CTC) and tumor-derived extracellular vesicle (tdEV) loads are prognostic factors of survival in patients with carcinoma. The current method of CTC enumeration relies on operator review and, unfortunately, has moderate interoperator agreement (Fleiss' kappa 0.60) due to difficulties in classifying CTC-like events. We compared operator review, ACCEPT automated image processing, and refined the output of a deep-learning algorithm to identify CTC and tdEV for the prediction of survival in patients with metastatic and nonmetastatic cancers. Operator review is only defined for CTC. Refinement was performed using automatic contrast maximization CM-CTC of events detected in cancer and in benign samples (CM-CTC). We used 418 samples from benign diseases, 6,293 from nonmetastatic breast, 2,408 from metastatic breast, and 698 from metastatic prostate cancer to train, test, optimize, and evaluate CTC and tdEV enumeration. For CTC identification, the CM-CTC performed best on metastatic/nonmetastatic breast cancer, respectively, with a hazard ratio (HR) for overall survival of 2.6/2.1 vs. 2.4/1.4 for operator CTC and 1.2/0.8 for ACCEPT-CTC. For tdEV identification, CM-tdEV performed best with an HR of 1.6/2.9 vs. 1.5/1.0 with ACCEPT-tdEV. In conclusion, contrast maximization is effective even though it does not utilize domain knowledge.

Keywords: automated classifier; circulating tumor cell; label uncertainty; prognostic power; tumor-derived extracellular vesicle.