Effects of sample re-sequencing and trimming on the quality and size of assembled consensus sequences

F Prosdocimi; D A O Lopes; F C Peixoto; M M Mourão; L G G Pacífico; R A Ribeiro; J M Ortega

Effects of sample re-sequencing and trimming on the quality and size of assembled consensus sequences

Genet Mol Res. 2007 Oct 5;6(4):756-65.

Authors

F Prosdocimi¹, D A O Lopes, F C Peixoto, M M Mourão, L G G Pacífico, R A Ribeiro, J M Ortega

Affiliation

¹ Laboratório de Biodados, Departamento de Bioquímica e Imunologia, ICB-UFMG, Belo Horizonte, MG, Brasil.

PMID: 18058703

Abstract

The production of nucleic acid sequences by automatic DNA sequencer machines is always associated with some base-calling errors. In order to produce a high-quality DNA sequence from a molecule of interest, researchers normally sequence the same sample many times. Considering base-calling errors as rare events, re-sequencing the same molecule and assembling the reads produced are frequently thought to be a good way to generate reliable sequences. However, a relevant question on this issue is: how many times the sample needs to be re-sequenced to minimize costs and achieve a high-fidelity sequence? We examined how both the number of re-sequenced reads and PHRED trimming parameters affect the accuracy and size of final consensus sequences. Hundreds of single-pool reaction pUC18 reads were generated and assembled into consensus sequences with CAP3 software. Using local alignment against the published pUC18 cloning vector sequence, the position and number of errors in the consensus were identified and stored in MySQL databases. Stringent PHRED trimming parameters proved to be efficient for the reduction of errors; however, this procedure also decreased consensus size. Moreover, re-sequencing did not have a clear effect on the removal of consensus errors, although it was able to slightly increase consensus.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Base Pair Mismatch
Base Sequence
Consensus Sequence*
Plasmids / genetics
Sequence Analysis, DNA / methods*