Interpretation of custom designed Illumina genotype cluster plots for targeted association studies and next-generation sequence validation

BMC Res Notes. 2010 Feb 22:3:39. doi: 10.1186/1756-0500-3-39.

Abstract

Background: High-throughput custom designed genotyping arrays are a valuable resource for biologically focused research studies and increasingly for validation of variation predicted by next-generation sequencing (NGS) technologies. We investigate the Illumina GoldenGate chemistry using custom designed VeraCode and sentrix array matrix (SAM) assays for each of these applications, respectively. We highlight applications for interpretation of Illumina generated genotype cluster plots to maximise data inclusion and reduce genotyping errors.

Findings: We illustrate the dramatic effect of outliers in genotype calling and data interpretation, as well as suggest simple means to avoid genotyping errors. Furthermore we present this platform as a successful method for two-cluster rare or non-autosomal variant calling. The success of high-throughput technologies to accurately call rare variants will become an essential feature for future association studies. Finally, we highlight additional advantages of the Illumina GoldenGate chemistry in generating unusually segregated cluster plots that identify potential NGS generated sequencing error resulting from minimal coverage.

Conclusions: We demonstrate the importance of visually inspecting genotype cluster plots generated by the Illumina software and issue warnings regarding commonly accepted quality control parameters. In addition to suggesting applications to minimise data exclusion, we propose that the Illumina cluster plots may be helpful in identifying potential in-put sequence errors, particularly important for studies to validate NGS generated variation.