Logic Forest: an ensemble classifier for discovering logical combinations of binary markers

Bioinformatics. 2010 Sep 1;26(17):2183-9. doi: 10.1093/bioinformatics/btq354. Epub 2010 Jul 13.

Abstract

Motivation: Highly sensitive and specific screening tools may reduce disease -related mortality by enabling physicians to diagnose diseases in asymptomatic patients or at-risk individuals. Diagnostic tests based on multiple biomarkers may achieve the needed sensitivity and specificity to realize this clinical gain.

Results: Logic regression, a multivariable regression method predicting an outcome using logical combinations of binary predictors, yields interpretable models of the complex interactions in biologic systems. However, its performance degrades in noisy data. We extend logic regression for classification to an ensemble of logic trees (Logic Forest, LF). We conduct simulation studies comparing the ability of logic regression and LF to identify variable interactions predictive of disease status. Our findings indicate LF is superior to logic regression for identifying important predictors. We apply our method to single nucleotide polymorphism data to determine associations of genetic and health factors with periodontal disease.

Availability: LF code is publicly available on CRAN, http://cran.r-project.org/.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biomarkers / analysis*
  • Computer Simulation
  • Humans
  • Models, Biological*
  • Multivariate Analysis
  • Periodontal Diseases / diagnosis
  • Periodontal Diseases / genetics
  • Polymorphism, Single Nucleotide
  • Regression Analysis*

Substances

  • Biomarkers