Identification of conserved and polymorphic STRs for personal genomes

BMC Genomics. 2014;15 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2164-15-S10-S3. Epub 2014 Dec 12.

Abstract

Background: Short tandem repeats (STRs) are abundant in human genomes. Numerous STRs have been shown to be associated with genetic diseases and gene regulatory functions, and have been selected as genetic markers for evolutionary and forensic analyses. High-throughput next generation sequencers have fostered new cutting-edge computing techniques for genome-scale analyses, and cross-genome comparisons have facilitated the efficient identification of polymorphic STR markers for various applications.

Results: An automated and efficient system for detecting human polymorphic STRs at the genome scale is proposed in this study. Assembled contigs from next generation sequencing data were aligned and calibrated according to selected reference sequences. To verify identified polymorphic STRs, human genomes from the 1000 Genomes Project were employed for comprehensive analyses, and STR markers from the Combined DNA Index System (CODIS) and disease-related STR motifs were also applied as cases for evaluation. In addition, we analyzed STR variations for highly conserved homologous genes and human-unique genes. In total 477 polymorphic STRs were identified from 492 human-unique genes, among which 26 STRs were retrieved and clustered into three different groups for efficient comparison.

Conclusions: We have developed an online system that efficiently identifies polymorphic STRs and provides novel distinguishable STR biomarkers for different levels of specificity. Candidate polymorphic STRs within a personal genome could be easily retrieved and compared to the constructed STR profile through query keywords, gene names, or assembled contigs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Chromosomes, Human
  • Computational Biology / methods*
  • Conserved Sequence
  • Databases, Nucleic Acid
  • Disease / genetics*
  • Genome, Human*
  • Humans
  • Microsatellite Repeats*
  • Models, Statistical
  • Sequence Analysis, DNA / methods*
  • Species Specificity