Clustering by periodontitis-associated factors: A novel application to NHANES data

J Periodontol. 2021 Aug;92(8):1136-1150. doi: 10.1002/JPER.20-0489. Epub 2021 Jan 19.

Abstract

Background: Unsupervised clustering is a method used to identify heterogeneity among groups and homogeneity within a group of patients. Without a prespecified outcome entry, the resulting model deciphers patterns that may not be disclosed using traditional methods. This is the first time such clustering analysis is applied in identifying unique subgroups at high risk for periodontitis in National Health and Nutrition Examination Surveys (NHANES 2009 to 2014 data sets using >500 variables.

Methods: Questionnaire, examination, and laboratory data (33 tables) for >1,000 variables were merged from 14,072 respondents who underwent clinical periodontal examination. Participants with ≥6 teeth and available data for all selected categories were included (N = 1,222). Data wrangling produced 519 variables. k-means/modes clustering (k = 2:14) was deployed. The optimal k-value was determined through the elbow method, formula = ∑ (xi2 ) - ((∑ xi )2 /n). The 5-cluster model showing the highest variability (63.08%) was selected. The 2012 Centers for Disease Control and Prevention/American Academy of Periodontology (AAP) and 2018 European Federation of Periodontology/AAP periodontitis case definitions were applied.

Results: Cluster 1 (n = 249) showed the highest prevalence of severe periodontitis (43%); 39% self-reported "fair" general health; 55% had household income <$35,000/year; and 48% were current smokers. Cluster 2 (n = 154) had one participant with periodontitis. Cluster 3 (n = 242) represented the greatest prevalence of moderate periodontitis (53%). In Cluster 4 (n = 35) only one participant had no periodontitis. Cluster 5 (n = 542) was the systemically healthiest with 77% having no/mild periodontitis.

Conclusion: Clustering of NHANES demographic, systemic health, and socioeconomic data effectively identifies characteristics that are statistically significantly related to periodontitis status and hence detects subpopulations at high risk for periodontitis without costly clinical examinations.

Keywords: chronic periodontitis; cluster analysis; dental health surveys; knowledge discovery; patient reported outcome measures.

MeSH terms

  • Centers for Disease Control and Prevention, U.S.
  • Cluster Analysis
  • Humans
  • Nutrition Surveys
  • Periodontitis* / epidemiology
  • Prevalence
  • United States / epidemiology