Normalization of Large-Scale Transcriptome Data Using Heuristic Methods

Arthur Yosef; Eli Shnaider; Moti Schneider; Michael Gurevich

doi:10.1177/11779322231160397

Normalization of Large-Scale Transcriptome Data Using Heuristic Methods

Bioinform Biol Insights. 2023 Mar 31:17:11779322231160397. doi: 10.1177/11779322231160397. eCollection 2023.

Authors

Arthur Yosef¹, Eli Shnaider², Moti Schneider², Michael Gurevich³

Affiliations

¹ Tel Aviv-Yaffo Academic College, Yaffo, Israel.
² Netanya Academic College, Netanya, Israel.
³ Tel Aviv University, Tel Aviv, Israel.

Abstract

In this study, we introduce an artificial intelligent method for addressing the batch effect of a transcriptome data. The method has several clear advantages in comparison with the alternative methods presently in use. Batch effect refers to the discrepancy in gene expression data series, measured under different conditions. While the data from the same batch (measurements performed under the same conditions) are compatible, combining various batches into 1 data set is problematic because of incompatible measurements. Therefore, it is necessary to perform correction of the combined data (normalization), before performing biological analysis. There are numerous methods attempting to correct data set for batch effect. These methods rely on various assumptions regarding the distribution of the measurements. Forcing the data elements into pre-supposed distribution can severely distort biological signals, thus leading to incorrect results and conclusions. As the discrepancy between the assumptions regarding the data distribution and the actual distribution is wider, the biases introduced by such "correction methods" are greater. We introduce a heuristic method to reduce batch effect. The method does not rely on any assumptions regarding the distribution and the behavior of data elements. Hence, it does not introduce any new biases in the process of correcting the batch effect. It strictly maintains the integrity of measurements within the original batches.

Keywords: Data mining; cluster construction; gene expressions; heuristic methods; soft computing.