A biological continuum based approach for efficient clinical classification

J Biomed Inform. 2014 Feb:47:28-38. doi: 10.1016/j.jbi.2013.09.002. Epub 2013 Sep 12.

Abstract

Clinical feature selection problem is the task of selecting and identifying a subset of informative clinical features that are useful for promoting accurate clinical diagnosis. This is a significant task of pragmatic value in the clinical settings as each clinical test is associated with a different financial cost, diagnostic value, and risk for obtaining the measurement. Moreover, with continual introduction of new clinical features, the need to repeat the feature selection task can be very time consuming. Therefore to address this issue, we propose a novel feature selection technique for diagnosis of myocardial infarction - one of the leading causes of morbidity and mortality in many high-income countries. This method adopts the conceptual framework of biological continuum, the optimization capability of genetic algorithm for performing feature selection and the classification ability of support vector machine. Together, a network of clinical risk factors, called the biological continuum based etiological network (BCEN), was constructed. Evaluation of the proposed methods was carried out using the cardiovascular heart study (CHS) dataset. Results demonstrate a significant speedup of 4.73-fold can be achieved for the development of MI classification model. The key advantage of this methodology is the provision of a reusable (feature subset) paradigm for efficient development of up-to-date and efficacious clinical classification models.

Keywords: Classification; Dimensionality reduction; Etiological network; Feature selection; Genetic algorithm; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Aging*
  • Algorithms
  • Artificial Intelligence
  • Bayes Theorem
  • California
  • Cardiovascular Diseases / classification
  • Cohort Studies
  • Data Collection
  • Humans
  • Maryland
  • Medical Informatics / methods*
  • Models, Theoretical
  • North Carolina
  • Pattern Recognition, Automated / methods*
  • Pennsylvania
  • Risk Factors
  • Rural Population
  • Support Vector Machine*
  • Urban Population