CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles

BMC Res Notes. 2020 Jul 14;13(1):338. doi: 10.1186/s13104-020-05171-6.

Abstract

Objective: Data normalization and clustering are mandatory steps in gene expression and downstream analyses, respectively. However, user-friendly implementations of these methodologies are available exclusively under expensive licensing agreements, or in stand-alone scripts developed, reflecting on a great obstacle for users with less computational skills.

Results: We developed an online tool called CORAZON (Correlations Analyses Zipper Online), which implements three unsupervised learning methods to cluster gene expression datasets in a friendly environment. It allows the usage of eight gene expression normalization/transformation methodologies and the attribute's influence. The normalizations requiring the gene length only could be performed to RNA-seq, meanwhile the others can be used with microarray and/or NanoString data. Clustering methodologies performances were evaluated through five models with accuracies between 92 and 100%. We applied our tool to obtain functional insights of non-coding RNAs (ncRNAs) based on Gene Ontology enrichment of clusters in a dataset generated by the ENCODE project. The clusters where the majority of transcripts are coding genes were enriched in Cellular, Metabolic, Transports, and Systems Development categories. Meanwhile, the ncRNAs were enriched in the Detection of Stimulus, Sensory Perception, Immunological System, and Digestion categories. CORAZON source-code is freely available at https://gitlab.com/integrativebioinformatics/corazon and the web-server can be accessed at http://corazon.integrativebioinformatics.me .

Keywords: Clustering; Expression profiling; Gene expression; Machine learning; Non-coding RNAs; Normalization; Transcriptome analysis; Web server.

MeSH terms

  • Cluster Analysis
  • Computers*
  • Gene Expression Profiling
  • Gene Ontology
  • Internet
  • RNA, Untranslated
  • Software*

Substances

  • RNA, Untranslated