EasyQC: Tool with Interactive User Interface for Efficient Next-Generation Sequencing Data Quality Control

J Comput Biol. 2018 Dec;25(12):1301-1311. doi: 10.1089/cmb.2017.0186. Epub 2018 Sep 8.

Abstract

The advent of next-generation sequencing (NGS) technologies has revolutionized the world of genomic research. Millions of sequences are generated in a short period of time and they provide intriguing insights to the researcher. Many NGS platforms have evolved over a period of time and their efficiency has been ever increasing. Still, primarily because of the chemistry, glitch in the sequencing machine and human handling errors, some artifacts tend to exist in the final sequence data set. These sequence errors have a profound impact on the downstream analyses and may provide misleading information. Hence, filtering of these erroneous reads has become inevitable and myriad of tools are available for this purpose. However, many of them are accessible as a command line interface that requires the user to enter each command manually. Here, we report EasyQC, a tool for NGS data quality control (QC) with a graphical user interface providing options to carry out trimming of NGS reads based on quality, length, homopolymer, and ambiguous bases. EasyQC also possesses features such as format converter, paired end merger, adapter trimmer, and a graph generator that generates quality distribution, length distribution, GC content, and base composition graphs. Comparison of raw and processed sequence data sets using EasyQC suggested significant increase in overall quality of the sequences. Testing of EasyQC using NGS data sets on a standalone desktop proved to be relatively faster. EasyQC is developed using PERL modules and can be executed in Windows and Linux platforms. With the various QC features, easy interface for end users, and cross-platform compatibility, EasyQC would be a valuable addition to the already existing tools facilitating better downstream analyses.

Keywords: NGS; downstream processing; graphical interface; quality control.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • High-Throughput Nucleotide Sequencing / methods
  • High-Throughput Nucleotide Sequencing / standards*
  • Quality Control*
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, DNA / standards*
  • Software / standards*