Spatial Clusters of Cancer Mortality in Brazil: A Machine Learning Modeling Approach

Int J Public Health. 2023 Jul 20:68:1604789. doi: 10.3389/ijph.2023.1604789. eCollection 2023.

Abstract

Objectives: Our aim was to test if machine learning algorithms can predict cancer mortality (CM) at an ecological level and use these results to identify statistically significant spatial clusters of excess cancer mortality (eCM). Methods: Age-standardized CM was extracted from the official databases of Brazil. Predictive features included sociodemographic and health coverage variables. Machine learning algorithms were selected and trained with 70% of the data, and the performance was tested with the remaining 30%. Clusters of eCM were identified using SatScan. Additionally, separate analyses were performed for the 10 most frequent cancer types. Results: The gradient boosting trees algorithm presented the highest coefficient of determination (R 2 = 0.66). For total cancer, all algorithms overlapped in the region of Bagé (27% eCM). For esophageal cancer, all algorithms overlapped in west Rio Grande do Sul (48%-96% eCM). The most significant cluster for stomach cancer was in Macapá (82% eCM). The most important variables were the percentage of the white population and residents with computers. Conclusion: We found consistent and well-defined geographic regions in Brazil with significantly higher than expected cancer mortality.

Keywords: Brazil; cancer; machine-learning; socioeconomic; spatial-clusters.

MeSH terms

  • Algorithms
  • Brazil / epidemiology
  • Humans
  • Machine Learning
  • Neoplasms*

Grants and funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.