Effective Invasiveness Recognition of Imbalanced Data by Semi-Automated Segmentations of Lung Nodules

Biomedicines. 2023 Oct 30;11(11):2938. doi: 10.3390/biomedicines11112938.

Abstract

Over the past few decades, recognition of early lung cancers was researched for effective treatments. In early lung cancers, the invasiveness is an important factor for expected survival rates. Hence, how to effectively identify the invasiveness by computed tomography (CT) images became a hot topic in the field of biomedical science. Although a number of previous works were shown to be effective on this topic, there remain some problems unsettled still. First, it needs a large amount of marked data for a better prediction, but the manual cost is high. Second, the accuracy is always limited in imbalance data. To alleviate these problems, in this paper, we propose an effective CT invasiveness recognizer by semi-automated segmentation. In terms of semi-automated segmentation, it is easy for doctors to mark the nodules. Just based on one clicked pixel, a nodule object in a CT image can be marked by fusing two proposed segmentation methods, including thresholding-based morphology and deep learning-based mask region-based convolutional neural network (Mask-RCNN). For thresholding-based morphology, an initial segmentation is derived by adaptive pixel connections. Then, a mathematical morphology is performed to achieve a better segmentation. For deep learning-based mask-RCNN, the anchor is fixed by the clicked pixel to reduce the computational complexity. To incorporate advantages of both, the segmentation is switched between these two sub-methods. After segmenting the nodules, a boosting ensemble classification model with feature selection is executed to identify the invasiveness by equalized down-sampling. The extensive experimental results on a real dataset reveal that the proposed segmentation method performs better than the traditional segmentation ones, which can reach an average dice improvement of 392.3%. Additionally, the proposed ensemble classification model infers better performances than the compared method, which can reach an area under curve (AUC) improvement of 5.3% and a specificity improvement of 14.3%. Moreover, in comparison with the models with imbalance data, the improvements of AUC and specificity can reach 10.4% and 33.3%, respectively.

Keywords: biomedical science; imbalance data; invasiveness recognition; lung cancer; semi-automated segmentation.