Deep Learning-Based Assessment of Built Environment From Satellite Images and Cardiometabolic Disease Prevalence

JAMA Cardiol. 2024 May 1:e240749. doi: 10.1001/jamacardio.2024.0749. Online ahead of print.

Abstract

Importance: Built environment plays an important role in development of cardiovascular disease. Large scale, pragmatic evaluation of built environment has been limited owing to scarce data and inconsistent data quality.

Objective: To investigate the association between image-based built environment and the prevalence of cardiometabolic disease in urban cities.

Design, setting, and participants: This cross-sectional study used features extracted from Google satellite images (GSI) to measure the built environment and link them with prevalence of cardiometabolic disease. Convolutional neural networks, light gradient-boosting machines, and activation maps were used to assess the association with health outcomes and identify feature associations with coronary heart disease (CHD), stroke, and chronic kidney disease (CKD). The study obtained aerial images from GSI covering census tracts in 7 cities (Cleveland, Ohio; Fremont, California; Kansas City, Missouri; Detroit, Michigan; Bellevue, Washington; Brownsville, Texas; and Denver, Colorado). The study used census tract-level data from the US Centers for Disease Control and Prevention's 500 Cities project. The data were originally collected from the Behavioral Risk Factor Surveillance System that surveyed people 18 years and older across the country. Analyses were conducted from February to December 2022.

Exposures: GSI images of built environment and cardiometabolic disease prevalence.

Main outcomes and measures: Census tract-level estimated prevalence of CHD, stroke, and CKD based on image-based built environment features.

Results: The study obtained 31 786 aerial images from GSI covering 789 census tracts. Built environment features extracted from GSI using machine learning were associated with prevalence of CHD (R2 = 0.60), stroke (R2 = 0.65), and CKD (R2 = 0.64). The model performed better at distinguishing differences between cardiometabolic prevalence between cities than within cities (eg, highest within-city R2 = 0.39 vs between-city R2 = 0.64 for CKD). Addition of GSI features both outperformed and improved the model that only included age, sex, race, income, education, and composite indices for social determinants of health (R2 = 0.83 vs R2 = 0.76 for CHD; P <.001). Activation maps from the features revealed certain health-related built environment such as roads, highways, and railroads and recreational facilities such as amusement parks, arenas, and baseball parks.

Conclusions and relevance: In this cross-sectional study, a significant portion of cardiometabolic disease prevalence was associated with GSI-based built environment using convolutional neural networks.