Phenotype sequencing: identifying the genes that cause a phenotype directly from pooled sequencing of independent mutants

PLoS One. 2011 Feb 18;6(2):e16517. doi: 10.1371/journal.pone.0016517.

Abstract

Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50-100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a "Phenotype Sequencing" approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost $7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only $1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only $110-$340.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Base Sequence
  • Butanols / pharmacology
  • Drug Tolerance / genetics
  • Drug Tolerance / physiology
  • Escherichia coli / genetics
  • Gene Library
  • Genetic Association Studies / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Models, Biological
  • Models, Theoretical
  • Molecular Sequence Data
  • Mutant Proteins / analysis
  • Mutant Proteins / genetics*
  • Mutation / physiology
  • Oligonucleotide Array Sequence Analysis / methods
  • Organisms, Genetically Modified
  • Phenotype*
  • Sequence Analysis, DNA / methods*
  • Specimen Handling / methods*

Substances

  • Butanols
  • Mutant Proteins
  • isobutyl alcohol