An Iterative Unsupervised Method for Gene Expression Differentiation

Genes (Basel). 2023 Feb 4;14(2):412. doi: 10.3390/genes14020412.

Abstract

For several decades, intensive research for understanding gene activity and its role in organism's lives is the research focus of scientists in different areas. A part of these investigations is the analysis of gene expression data for selecting differentially expressed genes. Methods that identify the interested genes have been proposed on statistical data analysis. The problem is that there is no good agreement among them, as different results are produced by distinct methods. By taking the advantage of the unsupervised data analysis, an iterative clustering procedure that finds differentially expressed genes shows promising results. In the present paper, a comparative study of the clustering methods applied for gene expression analysis is presented to explicate the choice of the clustering algorithm implemented in the method. An investigation of different distance measures is provided to reveal those that increase the efficiency of the method in finding the real data structure. Further, the method is improved by incorporating an additional aggregation measure based on the standard deviation of the expression levels. Its usage increases the gene distinction as a new amount of differentially expressed genes is found. The method is summarized in a detailed procedure. The significance of the method is proved by an analysis of two mice strain data sets. The differentially expressed genes defined by the proposed method are compared with those selected by the well-known statistical methods applied to the same data set.

Keywords: clustering analysis; density-based clustering; differentially expressed genes; gene expression data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Cluster Analysis
  • Data Interpretation, Statistical
  • Gene Expression
  • Gene Expression Profiling* / methods
  • Mice

Grants and funding

This research has been supported by the GATE project, funded by the European Union’s Horizon 2020 WIDESPREAD-2018-2020 TEAMING Phase 2 program, under grant No. 857155, and Operational Programme Science by Operational Programme Science and Education for Smart Growth, under grant No. BG05M2OP001-1.003-0002-C01.