Enhancing discoveries of molecular QTL studies with small sample size using summary statistic imputation

Tao Wang; Yongzhuang Liu; Quanwei Yin; Jiaquan Geng; Jin Chen; Xipeng Yin; Yongtian Wang; Xuequn Shang; Chunwei Tian; Yadong Wang; Jiajie Peng

doi:10.1093/bib/bbab370

Enhancing discoveries of molecular QTL studies with small sample size using summary statistic imputation

Brief Bioinform. 2022 Jan 17;23(1):bbab370. doi: 10.1093/bib/bbab370.

Authors

Tao Wang^{1

2

3}, Yongzhuang Liu³, Quanwei Yin^{1

2}, Jiaquan Geng^{1

2}, Jin Chen⁴, Xipeng Yin⁵, Yongtian Wang^{1

2}, Xuequn Shang^{1

2}, Chunwei Tian⁶, Yadong Wang³, Jiajie Peng^{1

2}

Affiliations

¹ School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Rd, 710129, Xi'an, China.
² Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, 1 Dongxiang Rd, 710129, Xi'an, China.
³ School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi St., 150001, Harbin, China.
⁴ Institute for Biomedical Informatics, University of Kentucky, Lexington, 40536, KY, USA.
⁵ School of Software, Northwestern Polytechnical University, 1 Dongxiang Road, 710129, Xi'an, China.
⁶ Northwestern Polytechnical University, 1 Dongxiang Road, 710129, Xi'an, China.

PMID: 34545927
DOI: 10.1093/bib/bbab370

Abstract

Quantitative trait locus (QTL) analyses of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), have been widely used to infer the functional effects of genome variants. However, the QTL discovery is largely restricted by the limited study sample size, which demands higher threshold of minor allele frequency and then causes heavy missing molecular trait-variant associations. This happens prominently in single-cell level molecular QTL studies because of sample availability and cost. It is urgent to propose a method to solve this problem in order to enhance discoveries of current molecular QTL studies with small sample size. In this study, we presented an efficient computational framework called xQTLImp to impute missing molecular QTL associations. In the local-region imputation, xQTLImp uses multivariate Gaussian model to impute the missing associations by leveraging known association statistics of variants and the linkage disequilibrium (LD) around. In the genome-wide imputation, novel procedures are implemented to improve efficiency, including dynamically constructing a reused LD buffer, adopting multiple heuristic strategies and parallel computing. Experiments on various multiomic bulk and single-cell sequencing-based QTL datasets have demonstrated high imputation accuracy and novel QTL discovery ability of xQTLImp. Finally, a C++ software package is freely available at https://github.com/stormlovetao/QTLIMP.

Keywords: QTL analysis; imputation framework; single-cell; small sample size; summary statistics.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Genome-Wide Association Study* / methods
Genotype
Linkage Disequilibrium
Phenotype
Polymorphism, Single Nucleotide
Quantitative Trait Loci*
Sample Size