PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling

BMC Bioinformatics. 2022 May 28;23(1):197. doi: 10.1186/s12859-022-04727-6.

Abstract

Background: Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries, access to computational resources, and flexible pipelines are big hurdles to adopting these advanced methods.

Results: To address the above mentioned barriers, we have implemented the peptide design and analysis under Galaxy (PDAUG) package, a Galaxy-based Python powered collection of tools, workflows, and datasets for rapid in-silico peptide library analysis. In contrast to existing methods like standard programming libraries or rigid single-function web-based tools, PDAUG offers an integrated GUI-based toolset, providing flexibility to build and distribute reproducible pipelines and workflows without programming expertise. Finally, we demonstrate the usability of PDAUG in predicting anticancer properties of peptides using four different feature sets and assess the suitability of various ML algorithms.

Conclusion: PDAUG offers tools for peptide library generation, data visualization, built-in and public database peptide sequence retrieval, peptide feature calculation, and machine learning (ML) modeling. Additionally, this toolset facilitates researchers to combine PDAUG with hundreds of compatible existing Galaxy tools for limitless analytic strategies.

MeSH terms

  • Algorithms
  • Machine Learning
  • Peptide Library*
  • Peptides / chemistry
  • Software*

Substances

  • Peptide Library
  • Peptides