Sample Tracking Using Unique Sequence Controls

Richard A Moore; Thomas Zeng; T Roderick Docking; Ian Bosdet; Yaron S Butterfield; Sarah Munro; Irene Li; Lucas Swanson; Elizabeth R Starks; Kane Tse; Andrew J Mungall; Robert A Holt; Aly Karsan

doi:10.1016/j.jmoldx.2019.10.011

Sample Tracking Using Unique Sequence Controls

J Mol Diagn. 2020 Feb;22(2):141-146. doi: 10.1016/j.jmoldx.2019.10.011. Epub 2019 Dec 16.

Authors

Affiliations

¹ Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia; Faculty of Health Science, Simon Fraser University, Burnaby, British Columbia. Electronic address: rmoore@bcgsc.ca.
² Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia.
³ Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada.
⁴ Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia; Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada.
⁵ Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia; Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada. Electronic address: akarsan@bcgsc.ca.

PMID: 31837431
DOI: 10.1016/j.jmoldx.2019.10.011

Abstract

Sample tracking and identity are essential when processing multiple samples in parallel. Sequencing applications often involve high sample numbers, and the data are frequently used in a clinical setting. As such, a simple and accurate intrinsic sample tracking process through a sequencing pipeline is essential. Various solutions have been implemented to verify sample identity, including variant detection at the start and end of the pipeline using arrays or genotyping, bioinformatic comparisons, and optical barcoding of samples. None of these approaches are optimal. To establish a more effective approach using genetic barcoding, we developed a panel of unique DNA sequences cloned into a common vector. A unique DNA sequence is added to the sample when it is first received and can be detected by PCR and/or sequencing at any stage of the process. The control sequences are approximately 200 bases long with low identity to any sequence in the National Center for Biotechnology Information nonredundant database (<30 bases) and contain no long homopolymer (>7) stretches. When a spiked next-generation sequencing library is sequenced, sequence reads derived from this control sequence are generated along with the standard sequencing run and are used to confirm sample identity and determine cross-contamination levels. This approach is used in our targeted clinical diagnostic whole-genome and RNA-sequencing pipelines and is an inexpensive, flexible, and platform-agnostic solution.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology
DNA Contamination
Databases, Nucleic Acid
Gene Library
High-Throughput Nucleotide Sequencing / methods*
High-Throughput Nucleotide Sequencing / standards*
Humans
Reference Standards
Reproducibility of Results
Sequence Analysis, DNA