Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning

Hayda Almeida; Adrian Tsang; Abdoulaye Baniré Diallo

doi:10.1093/bioinformatics/btac420

Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning

Bioinformatics. 2022 Aug 10;38(16):3984-3991. doi: 10.1093/bioinformatics/btac420.

Authors

Hayda Almeida^{1

2

3}, Adrian Tsang^{1

2}, Abdoulaye Baniré Diallo^{1

3

4}

Affiliations

¹ Departement d'Informatique, UQAM, Montréal, QC H2X 3Y7, Canada.
² Centre for Structural and Functional Genomics, Concordia University, Montréal, QC H4B 1R6, Canada.
³ Laboratoire d'Algèbre, de Combinatoire, et d'Informatique Mathématique (LACIM), UQAM, Montréal, QC H2X 3Y, Canada.
⁴ Centre of Excellence in Research on Orphan Diseases-Courtois Foundation (CERMO-FC), UQAM, Montréal, QC H2X 3Y7, Canada.

Abstract

Motivation: Precise identification of Biosynthetic Gene Clusters (BGCs) is a challenging task. Performance of BGC discovery tools is limited by their capacity to accurately predict components belonging to candidate BGCs, often overestimating cluster boundaries. To support optimizing the composition and boundaries of candidate BGCs, we propose reinforcement learning approach relying on protein domains and functional annotations from expert curated BGCs.

Results: The proposed reinforcement learning method aims to improve candidate BGCs obtained with state-of-the-art tools. It was evaluated on candidate BGCs obtained for two fungal genomes, Aspergillus niger and Aspergillus nidulans. The results highlight an improvement of the gene precision by above 15% for TOUCAN, fungiSMASH and DeepBGC; and cluster precision by above 25% for fungiSMASH and DeepBCG, allowing these tools to obtain almost perfect precision in cluster prediction. This can pave the way of optimizing current prediction of candidate BGCs in fungi, while minimizing the curation effort required by domain experts.

Availability and implementation: https://github.com/bioinfoUQAM/RL-bgc-components.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biosynthetic Pathways / genetics
Fungi* / genetics
Genome, Fungal
Multigene Family*

Grants and funding

Natural Sciences and Engineering Research Council (NSERC) and the Fonds de recherche du Québec-Nature et technologies (FRQNT)