An Expectation-Maximization Algorithm for Combining a Sample of Partially Overlapping Covariance Matrices

Deniz Akdemir; Mohamed Somo; Julio Isidro-Sanchéz

doi:10.3390/axioms12020161

An Expectation-Maximization Algorithm for Combining a Sample of Partially Overlapping Covariance Matrices

Axioms. 2023 Feb;12(2):161. doi: 10.3390/axioms12020161. Epub 2023 Feb 4.

Authors

Deniz Akdemir¹, Mohamed Somo², Julio Isidro-Sanchéz³

Affiliations

¹ Center of International Bone Marrow Transplantation Research, Minneapolis, MN 55401-1206, USA.
² Syngenta Seeds, Junction City, KS 66441, USA.
³ Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, 28223, Madrid, Spain.

Abstract

The generation of unprecedented amounts of data brings new challenges in data management, but also an opportunity to accelerate the identification of processes of multiple science disciplines. One of these challenges is the harmonization of high-dimensional unbalanced and heterogeneous data. In this manuscript, we propose a statistical approach to combine incomplete and partially-overlapping pieces of covariance matrices that come from independent experiments. We assume that the data are a random sample of partial covariance matrices sampled from Wishart distributions and we derive an expectation-maximization algorithm for parameter estimation. We demonstrate the properties of our method by (i) using simulation studies and (ii) using empirical datasets. In general, being able to make inferences about the covariance of variables not observed in the same experiment is a valuable tool for data analysis since covariance estimation is an important step in many statistical applications, such as multivariate analysis, principal component analysis, factor analysis, and structural equation modeling.

Keywords: 62H12; 62P10; 62h20; covariance estimation; expectation-maximization; heterogeneous databases; imputation; multi-view data.

Grants and funding

U24 CA076518/CA/NCI NIH HHS/United States