Mouse BAC ends quality assessment and sequence analyses

Genome Res. 2001 Oct;11(10):1736-45. doi: 10.1101/gr.179201.

Abstract

A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15x clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12x genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores > or = 20. RPCI-24 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and approximately 48% of the clones have both ends with > or = 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Chromosomes, Artificial, Bacterial / genetics*
  • Cloning, Molecular / methods
  • Contig Mapping / methods
  • Expressed Sequence Tags
  • Female
  • Genetic Vectors / genetics
  • Genome
  • Humans
  • Mice
  • Mice, Inbred C57BL
  • Quality Control
  • Repetitive Sequences, Nucleic Acid / genetics
  • Sequence Analysis, DNA / instrumentation
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / standards
  • Sequence Tagged Sites
  • Software