A Machine Learning-Based Investigation of Gender-Specific Prognosis of Lung Cancers

Medicina (Kaunas). 2021 Jan 22;57(2):99. doi: 10.3390/medicina57020099.

Abstract

Background and objective: Primary lung cancer is a lethal and rapidly-developing cancer type and is one of the most leading causes of cancer deaths.

Materials and methods: Statistical methods such as Cox regression are usually used to detect the prognosis factors of a disease. This study investigated survival prediction using machine learning algorithms. The clinical data of 28,458 patients with primary lung cancers were collected from the Surveillance, Epidemiology, and End Results (SEER) database.

Results: This study indicated that the survival rate of women with primary lung cancer was often higher than that of men (p < 0.001). Seven popular machine learning algorithms were utilized to evaluate one-year, three-year, and five-year survival prediction The two classifiers extreme gradient boosting (XGB) and logistic regression (LR) achieved the best prediction accuracies. The importance variable of the trained XGB models suggested that surgical removal (feature "Surgery") made the largest contribution to the one-year survival prediction models, while the metastatic status (feature "N" stage) of the regional lymph nodes was the most important contributor to three-year and five-year survival prediction. The female patients' three-year prognosis model achieved a prediction accuracy of 0.8297 on the independent future samples, while the male model only achieved the accuracy 0.7329.

Conclusions: This data suggested that male patients may have more complicated factors in lung cancer than females, and it is necessary to develop gender-specific diagnosis and prognosis models.

Keywords: gender; lung cancer; machine learning; prognosis; survival prediction.

MeSH terms

  • Algorithms
  • Female
  • Humans
  • Logistic Models
  • Lung Neoplasms* / diagnosis
  • Machine Learning*
  • Male
  • Prognosis