Multi-View Data Integration Methods for Radiotherapy Structure Name Standardization

Cancers (Basel). 2021 Apr 9;13(8):1796. doi: 10.3390/cancers13081796.

Abstract

Standardization of radiotherapy structure names is essential for developing data-driven personalized radiotherapy treatment plans. Different types of data are associated with radiotherapy structures, such as the physician-given text labels, geometric (image) data, and Dose-Volume Histograms (DVH). Prior work on structure name standardization used just one type of data. We present novel approaches to integrate complementary types (views) of structure data to build better-performing machine learning models. We present two methods, namely (a) intermediate integration and (b) late integration, to combine physician-given textual structure name features and geometric information of structures. The dataset consisted of 709 prostate cancer and 752 lung cancer patients across 40 radiotherapy centers administered by the U.S. Veterans Health Administration (VA) and the Department of Radiation Oncology, Virginia Commonwealth University (VCU). We used randomly selected data from 30 centers for training and ten centers for testing. We also used the VCU data for testing. We observed that the intermediate integration approach outperformed the models with a single view of the dataset, while late integration showed comparable performance with single-view results. Thus, we demonstrate that combining different views (types of data) helps build better models for structure name standardization to enable big data analytics in radiation oncology.

Keywords: TG-263; image classification; machine learning; multi-view data integration; radiotherapy structure names; text categorization; weighting techniques.