A fast algorithm to factorize high-dimensional tensor product matrices used in genetic models

G3 (Bethesda). 2024 Mar 6;14(3):jkae001. doi: 10.1093/g3journal/jkae001.

Abstract

Many genetic models (including models for epistatic effects as well as genetic-by-environment) involve covariance structures that are Hadamard products of lower rank matrices. Implementing these models requires factorizing large Hadamard product matrices. The available algorithms for factorization do not scale well for big data, making the use of some of these models not feasible with large sample sizes. Here, based on properties of Hadamard products and (related) Kronecker products, we propose an algorithm that produces an approximate decomposition that is orders of magnitude faster than the standard eigenvalue decomposition. In this article, we describe the algorithm, show how it can be used to factorize large Hadamard product matrices, present benchmarks, and illustrate the use of the method by presenting an analysis of data from the northern testing locations of the G × E project from the Genomes to Fields Initiative (n ∼ 60,000). We implemented the proposed algorithm in the open-source "tensorEVD" R package.

Keywords: R package; covariance matrix; eigenvalue decomposition; genetic model.

MeSH terms

  • Algorithms*
  • Genome
  • Models, Genetic*
  • Sample Size