A cross-sample statistical model for SNP detection in short-read sequencing data

Omkar Muralidharan; Georges Natsoulis; John Bell; Daniel Newburger; Hua Xu; Itai Kela; Hanlee Ji; Nancy Zhang

doi:10.1093/nar/gkr851

A cross-sample statistical model for SNP detection in short-read sequencing data

Nucleic Acids Res. 2012 Jan;40(1):e5. doi: 10.1093/nar/gkr851. Epub 2011 Nov 7.

Authors

Omkar Muralidharan¹, Georges Natsoulis, John Bell, Daniel Newburger, Hua Xu, Itai Kela, Hanlee Ji, Nancy Zhang

Affiliation

¹ Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA, 94305, USA.

Abstract

Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Alleles
Genotyping Techniques
High-Throughput Nucleotide Sequencing
Models, Statistical*
Polymorphism, Single Nucleotide*
Sequence Analysis, DNA / methods*

Abstract

Publication types

MeSH terms

Grants and funding