reString: an open-source Python software to perform automatic functional enrichment retrieval, results aggregation and data visualization

Sci Rep. 2021 Dec 6;11(1):23458. doi: 10.1038/s41598-021-02528-0.

Abstract

Functional enrichment analysis is an analytical method to extract biological insights from gene expression data, popularized by the ever-growing application of high-throughput techniques. Typically, expression profiles are generated for hundreds to thousands of genes/proteins from samples belonging to two experimental groups, and after ad-hoc statistical tests, researchers are left with lists of statistically significant entities, possibly lacking any unifying biological theme. Functional enrichment tackles the problem of putting overall gene expression changes into a broader biological context, based on pre-existing knowledge bases of reference: database collections of known expression regulation, relationships and molecular interactions. STRING is among the most popular tools, providing both protein-protein interaction networks and functional enrichment analysis for any given set of identifiers. For complex experimental designs, manually retrieving, interpreting, analyzing and abridging functional enrichment results is a daunting task, usually performed by hand by the average wet-biology researcher. We have developed reString, a cross-platform software that seamlessly retrieves from STRING functional enrichments from multiple user-supplied gene sets, with just a few clicks, without any need for specific bioinformatics skills. Further, it aggregates all findings into human-readable table summaries, with built-in features to easily produce user-customizable publication-grade clustermaps and bubble plots. Herein, we outline a complete reString protocol, showcasing its features on a real use-case.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Aorta / metabolism
  • Cluster Analysis*
  • Computational Biology / methods*
  • Data Mining / methods*
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Gene Expression Regulation*
  • Humans
  • Internet
  • Mice
  • Pattern Recognition, Automated*
  • Polymerase Chain Reaction
  • Programming Languages
  • Protein Interaction Maps
  • Proteins
  • RNA-Seq
  • Signal Transduction
  • Software
  • User-Computer Interface

Substances

  • Proteins