Benchmarking framework for machine learning classification from fNIRS data

Johann Benerradi; Jeremie Clos; Aleksandra Landowska; Michel F Valstar; Max L Wilson

doi:10.3389/fnrgo.2023.994969

Benchmarking framework for machine learning classification from fNIRS data

Front Neuroergon. 2023 Mar 3:4:994969. doi: 10.3389/fnrgo.2023.994969. eCollection 2023.

Authors

Johann Benerradi¹, Jeremie Clos¹, Aleksandra Landowska¹, Michel F Valstar¹, Max L Wilson¹

Affiliation

¹ School of Computer Science, University of Nottingham, Nottingham, United Kingdom.

Abstract

Background: While efforts to establish best practices with functional near infrared spectroscopy (fNIRS) signal processing have been published, there are still no community standards for applying machine learning to fNIRS data. Moreover, the lack of open source benchmarks and standard expectations for reporting means that published works often claim high generalisation capabilities, but with poor practices or missing details in the paper. These issues make it hard to evaluate the performance of models when it comes to choosing them for brain-computer interfaces.

Methods: We present an open-source benchmarking framework, BenchNIRS, to establish a best practice machine learning methodology to evaluate models applied to fNIRS data, using five open access datasets for brain-computer interface (BCI) applications. The BenchNIRS framework, using a robust methodology with nested cross-validation, enables researchers to optimise models and evaluate them without bias. The framework also enables us to produce useful metrics and figures to detail the performance of new models for comparison. To demonstrate the utility of the framework, we present a benchmarking of six baseline models [linear discriminant analysis (LDA), support-vector machine (SVM), k-nearest neighbours (kNN), artificial neural network (ANN), convolutional neural network (CNN), and long short-term memory (LSTM)] on the five datasets and investigate the influence of different factors on the classification performance, including: number of training examples and size of the time window of each fNIRS sample used for classification. We also present results with a sliding window as opposed to simple classification of epochs, and with a personalised approach (within subject data classification) as opposed to a generalised approach (unseen subject data classification).

Results and discussion: Results show that the performance is typically lower than the scores often reported in literature, and without great differences between models, highlighting that predicting unseen data remains a difficult task. Our benchmarking framework provides future authors, who are achieving significant high classification scores, with a tool to demonstrate the advances in a comparable way. To complement our framework, we contribute a set of recommendations for methodology decisions and writing papers, when applying machine learning to fNIRS data.

Keywords: benchmarking; deep learning; fNIRS; guidelines; machine learning; neural networks; open access data.

Grants and funding

This work was supported by the Engineering and Physical Sciences Research Council [EP/T022493/1 and EP/V00784X/1].