Characterizing substructure via mixture modeling in large-scale genetic summary statistics

Hayley R Stoneman; Adelle Price; Nikole Scribner Trout; Riley Lamont; Souha Tifour; Nikita Pozdeyev; Colorado Center for Personalized Medicine; Kristy Crooks; Meng Lin; Nicholas Rafaels; Christopher R Gignoux; Katie M Marker; Audrey E Hendricks

doi:10.1101/2024.01.29.577805

Characterizing substructure via mixture modeling in large-scale genetic summary statistics

bioRxiv [Preprint]. 2024 May 13:2024.01.29.577805. doi: 10.1101/2024.01.29.577805.

Authors

Hayley R Stoneman, Adelle Price, Nikole Scribner Trout, Riley Lamont, Souha Tifour, Nikita Pozdeyev; Colorado Center for Personalized Medicine; Kristy Crooks, Meng Lin, Nicholas Rafaels, Christopher R Gignoux, Katie M Marker, Audrey E Hendricks

Abstract

Genetic summary data are broadly accessible and highly useful including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into groups masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted substructure limits summary data usability, especially for understudied or admixed populations. Here, we present Summix2 , a comprehensive set of methods and software based on a computationally efficient mixture model to estimate and adjust for substructure in genetic summary data. In extensive simulations and application to public data, Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and identifies potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse publicly available summary data resulting in improved and more equitable research.

Publication types

Preprint

Abstract

Publication types

Grants and funding