nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline

J Open Source Softw. 2021;6(59):2957. doi: 10.21105/joss.02957. Epub 2021 Mar 2.

Abstract

A tool for conducting Genome-Wide Association Study (GWAS) in a systematic, automated and reproducible manner is overdue. We developed an automated GWAS pipeline by combining multiple analysis tools - including bcftools, vcftools, the R packages SNPRelate/GENESIS/GMMAT and ANNOVAR - through Nextflow, which is a portable, flexible, and reproducible reactive workflow framework for developing pipelines. The GWAS pipeline integrates the steps of data quality control and assessment and genetic association analyses, including analysis of cross-sectional and longitudinal studies with either single variants or gene-based tests, into a unified analysis workflow. The pipeline is implemented in Nextflow, dependencies are distributed through Docker, and the code is publicly available on Github.