MarkerML - Marker Feature Identification in Metagenomic Datasets Using Interpretable Machine Learning

J Mol Biol. 2022 Jun 15;434(11):167589. doi: 10.1016/j.jmb.2022.167589. Epub 2022 Apr 18.

Abstract

Identification of environment specific marker-features is one of the key objectives of many metagenomic studies. It aims to identify such features in microbiome datasets that may serve as markers of the contrasting or comparable states. Hypothesis testing and black-box machine learnt models which are conventionally used for identification of these features are generally not exhaustive, especially because they generally do-not provide any quantifiable relevance (context) of/between the identified features. We present MarkerML web-server, that seeks to leverage the emergence of interpretable machine learning for facilitating the contextual discovery of metagenomic features of interest. It does so through a comprehensive and automated application of the concept of Shapley Additive Explanations in companionship to the compositionality accounted hypothesis testing for the multi-variate microbiome datasets. MarkerML not only helps in identification of marker-features, but also enables insights into the role and inter-dependence of the identified features in driving the decision making of the supervised machine learnt model. Generation of high quality and intuitive visualizations spanning prediction effect plots, model performance reports, feature dependency plots, Shapley and abundance informed cladograms (Sungrams), hypothesis tested violin plots along-with necessary provisions for excluding the participant bias and ensuring reproducibility of results, further seek to make the platform a useful asset for the scientists in the field of microbiome (and even beyond). The MarkerML web-server is freely available for the academic community at https://microbiome.igib.res.in/markerml/.

Keywords: SHAP; interpretable machine learning; marker features; metagenomic biomarkers; microbiome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Datasets as Topic
  • Humans
  • Internet Use*
  • Machine Learning*
  • Metagenome
  • Metagenomics*
  • Reproducibility of Results