Machine learning approaches for phenotype-genotype mapping: predicting heterozygous mutations in the CYP21B gene from steroid profiles

Eur J Endocrinol. 2005 Aug;153(2):301-5. doi: 10.1530/eje.1.01957.

Abstract

Objective: Non-linear relations between multiple biochemical parameters are the basis for the diagnosis of many diseases. Traditional linear analytical methods are not reliable predictors. Novel nonlinear techniques are increasingly used to improve the diagnostic accuracy of automated data interpretation. This has been exemplified in particular for the classification and diagnostic prediction of cancers based on expression profiling data. Our objective was to predict the genotype from complex biochemical data by comparing the performance of experienced clinicians to traditional linear analysis, and to novel non-linear analytical methods.

Design and methods: As a model, we used a well-defined set of interconnected data consisting of unstimulated serum levels of steroid intermediates assessed in 54 subjects heterozygous for a mutation of the 21-hydroxylase gene (CYP21B) and in 43 healthy controls.

Results: The genetic alteration was predicted from the pattern of steroid levels with an accuracy of 39% by clinicians and of 64% by linear analysis. In contrast, non-linear analysis, such as self-organizing artificial neural networks, support vector machines, and nearest neighbour classifiers, allowed for higher accuracy up to 83%.

Conclusions: The successful application of these non-linear adaptive methods to capture specific biochemical problems may have generalized implications for biochemical testing in many areas. Nonlinear analytical techniques such as neural networks, support vector machines, and nearest neighbour classifiers may serve as an important adjunct to the decision process of a human investigator not 'trained' in a specific complex clinical or laboratory setting and may aid them to classify the problem more directly.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Artificial Intelligence*
  • Chromosome Mapping / methods*
  • Genotype
  • Heterozygote
  • Humans
  • Linear Models
  • Middle Aged
  • Models, Genetic*
  • Mutation
  • Nonlinear Dynamics
  • Phenotype
  • Predictive Value of Tests
  • Steroid 21-Hydroxylase / genetics*
  • Steroids / blood*

Substances

  • Steroids
  • Steroid 21-Hydroxylase