Efficient development of highly polymorphic microsatellite markers based on polymorphic repeats in transcriptome sequences of multiple individuals

Mol Ecol Resour. 2015 Jan;15(1):17-27. doi: 10.1111/1755-0998.12289. Epub 2014 Jun 28.

Abstract

The first hurdle in developing microsatellite markers, cloning, has been overcome by next-generation sequencing. The second hurdle is testing to differentiate polymorphic from nonpolymorphic loci. The third hurdle, somewhat hidden, is that only polymorphic markers with a large effective number of alleles are sufficiently informative to be deployed in multiple studies. Both steps are laborious and still performed manually. We have developed a strategy in which we first screen reads from multiple genotypes for repeats that show the most length variants, and only these are subsequently developed into markers. We validated our strategy in tetraploid garden rose using Illumina paired-end transcriptome sequences of 11 roses. Of 48 tested two markers failed to amplify, but all others were polymorphic. Ten loci amplified more than one locus, indicating duplicated genes or gene families. Completely avoiding duplicated loci will be difficult because the range of numbers of predicted alleles of highly polymorphic single- and multilocus markers largely overlapped. Of the remainder, half were replicate markers (i.e. multiple primer pairs for one locus), indicating the difficulty of correctly filtering short reads containing repeat sequences. We subsequently refined the approach to eliminate multiple primer sets to the same loci. The remaining 18 markers were all highly polymorphic, amplifying on average 11.7 alleles per marker (range = 6-20) in 11 tetraploid roses, exceeding the 8.2 alleles per marker of the 24 most polymorphic markers genotyped previously. This strategy therefore represents a major step forward in the development of highly polymorphic microsatellite markers.

Keywords: RNA-seq; microsatellite marker; next-generation sequencing; simple sequence repeat.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Computational Biology / methods*
  • Genetic Variation*
  • Genotyping Techniques / methods*
  • Microsatellite Repeats*
  • Molecular Sequence Data
  • Repetitive Sequences, Nucleic Acid*
  • Rosa / classification
  • Rosa / genetics
  • Sequence Analysis, DNA
  • Transcriptome*

Associated data

  • GENBANK/HG934830
  • GENBANK/HG934831
  • GENBANK/HG934832
  • GENBANK/HG934834
  • GENBANK/HG934835
  • GENBANK/HG934836
  • GENBANK/HG934837
  • GENBANK/HG934838
  • GENBANK/HG934840
  • GENBANK/HG934843
  • GENBANK/HG934844
  • GENBANK/HG934845
  • GENBANK/HG934846
  • GENBANK/HG934848
  • GENBANK/HG934849
  • GENBANK/HG934850
  • GENBANK/HG934851
  • GENBANK/LK392375