CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification

Genes (Basel). 2023 Mar 3;14(3):634. doi: 10.3390/genes14030634.

Abstract

Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, k-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets.

Keywords: CNN; KNN; Naïve-Bayes; SVM; UNITE; fungi ITS; k-mer; random forest; taxonomy; topsis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Fungi / genetics
  • Neural Networks, Computer
  • Software*

Grants and funding

Authors are thankful to Indian Council of Agricultural Research, Ministry of Agriculture and Farmers’ Welfare, Govt. of India for providing financial assistance from the Network Project on Agricultural Bioinformatics and Computational Biology to carry out this present research work.