The C-SHIFT Algorithm for Normalizing Covariances

Evgenia Chunikhina; Paul Logan; Yevgeniy Kovchegov; Anatoly Yambartsev; Debashis Mondal; Andrey Morgun

doi:10.1109/TCBB.2022.3151840

The C-SHIFT Algorithm for Normalizing Covariances

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):720-730. doi: 10.1109/TCBB.2022.3151840. Epub 2023 Feb 3.

Authors

Evgenia Chunikhina, Paul Logan, Yevgeniy Kovchegov, Anatoly Yambartsev, Debashis Mondal, Andrey Morgun

PMID: 35167480
DOI: 10.1109/TCBB.2022.3151840

Abstract

Omics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons for normalizing data is the need for recovering the covariance matrix used in gene network analysis. In this paper, we introduce a novel normalization technique, called the covariance shift (C-SHIFT) method. This normalization algorithm uses optimization techniques together with the blessing of dimensionality philosophy and energy minimization hypothesis for covariance matrix recovery under additive noise (in biology, known as the bias). Thus, it is perfectly suited for the analysis of logarithmic gene expression data. Numerical experiments on synthetic data demonstrate the method's advantage over the classical normalization techniques. Namely, the comparison is made with Rank, Quantile, cyclic LOESS (locally estimated scatterplot smoothing), and MAD (median absolute deviation) normalization methods. We also evaluate the performance of C-SHIFT algorithm on real biological data.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Gene Expression Profiling* / methods