BioDiscViz: A visualization support and consensus signature selector for BioDiscML results

PLoS One. 2023 Nov 30;18(11):e0294750. doi: 10.1371/journal.pone.0294750. eCollection 2023.

Abstract

Machine learning (ML) algorithms are powerful tools to find complex patterns and biomarker signatures when conventional statistical methods fail to identify them. While the ML field made significant progress, state of the art methodologies to build efficient and non-overfitting models are not always applied in the literature. To this purpose, automatic programs, such as BioDiscML, were designed to identify biomarker signatures and correlated features while escaping overfitting using multiple evaluation strategies, such as cross validation, bootstrapping and repeated holdout. To further improve BioDiscML and reach a broader audience, better visualization support and flexibility in choosing the best models and signatures are needed. Thus, to provide researchers with an easily accessible and usable tool for in depth investigation of the results from BioDiscML outputs, we developed a visual interaction tool called BioDiscViz. This tool provides summaries, tables and graphics, in the form of Principal Component Analysis (PCA) plots, UMAP, t-SNE, heatmaps and boxplots for the best model and the correlated features. Furthermore, this tool also provides visual support to extract a consensus signature from BioDiscML models using a combination of filters. BioDiscViz will be a great visual support for research using ML, hence new opportunities in this field by opening it to a broader community.

MeSH terms

  • Algorithms*
  • Biomarkers
  • Consensus
  • Machine Learning*

Substances

  • Biomarkers

Grants and funding

Dr Steve Bilodeau received a grant from the Canadian Institutes of Health Research (Grant Number: 387762) for the broader project encompassing BioDiscViz. We assure you that the funders played no role in the study design, data collection, analysis, the decision to publish, or the preparation of the manuscript.