A stochastic expectation and maximization algorithm for detecting quantitative trait-associated genes

Bioinformatics. 2011 Jan 1;27(1):63-9. doi: 10.1093/bioinformatics/btq558. Epub 2010 Oct 29.

Abstract

Motivation: Most biological traits may be correlated with the underlying gene expression patterns that are partially determined by DNA sequence variation. The correlations between gene expressions and quantitative traits are essential for understanding the functions of genes and dissecting gene regulatory networks.

Results: In the present study, we adopted a novel statistical method, called the stochastic expectation and maximization (SEM) algorithm, to analyze the associations between gene expression levels and quantitative trait values and identify genetic loci controlling the gene expression variations. In the first step, gene expression levels measured from microarray experiments were assigned to two different clusters based on the strengths of their association with the phenotypes of a quantitative trait under investigation. In the second step, genes associated with the trait were mapped to genetic loci of the genome. Because gene expressions are quantitative, the genetic loci controlling the expression traits are called expression quantitative trait loci. We applied the same SEM algorithm to a real dataset collected from a barley genetic experiment with both quantitative traits and gene expression traits. For the first time, we identified genes associated with eight agronomy traits of barley. These genes were then mapped to seven chromosomes of the barley genome. The SEM algorithm and the result of the barley data analysis are useful to scientists in the areas of bioinformatics and plant breeding.

Availability and implementation: The R program for the SEM algorithm can be downloaded from our website: http://www.statgen.ucr.edu.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Chromosome Mapping
  • Cluster Analysis
  • Gene Expression Profiling*
  • Gene Expression*
  • Gene Regulatory Networks
  • Genetic Association Studies
  • Hordeum / genetics
  • Linear Models
  • Oligonucleotide Array Sequence Analysis
  • Phenotype
  • Quantitative Trait Loci*
  • Stochastic Processes