Penalized mediation models for multivariate data

Genet Epidemiol. 2022 Feb;46(1):32-50. doi: 10.1002/gepi.22433. Epub 2021 Oct 19.

Abstract

Statistical methods to integrate multiple layers of data, from exposures to intermediate traits to outcome variables, are needed to guide interpretation of complex data sets for which variables are likely contributing in a causal pathway from exposure to outcome. Statistical mediation analysis based on structural equation models provide a general modeling framework, yet they can be difficult to apply to high-dimensional data and they are not automated to select the best fitting model. To overcome these limitations, we developed novel algorithms and software to simultaneously evaluate multiple exposure variables, multiple intermediate traits, and multiple outcome variables. Our penalized mediation models are computationally efficient and simulations demonstrate that they produce reliable results for large data sets. Application of our methods to a study of vascular disease demonstrates their utility to identify novel direct effects of single-nucleotide polymorphisms (SNPs) on coronary heart disease and peripheral artery disease, while disentangling the effects of SNPs on the intermediate risk factors including lipids, cigarette smoking, systolic blood pressure, and type 2 diabetes.

Keywords: L1 penalty; cardiovascular disease; data integration; mediation analysis; structural equation model.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Diabetes Mellitus, Type 2* / genetics
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Phenotype
  • Software