Analysis of genetic association using hierarchical clustering and cluster validation indices

Genomics. 2017 Oct;109(5-6):438-445. doi: 10.1016/j.ygeno.2017.06.009. Epub 2017 Jul 8.

Abstract

It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes.

Keywords: Association; Clustering; Genomics; Validation Indices.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cluster Analysis
  • Computational Biology / methods*
  • Gene Expression Profiling / methods
  • Gene Regulatory Networks
  • Genetic Association Studies / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods