Inter-rater reliability of functional MRI data quality control assessments: A standardised protocol and practical guide using pyfMRIqc

Brendan Williams; Nicholas Hedger; Carolyn B McNabb; Gabriella M K Rossetti; Anastasia Christakou

doi:10.3389/fnins.2023.1070413

Inter-rater reliability of functional MRI data quality control assessments: A standardised protocol and practical guide using pyfMRIqc

Front Neurosci. 2023 Feb 3:17:1070413. doi: 10.3389/fnins.2023.1070413. eCollection 2023.

Authors

Brendan Williams^{1

2}, Nicholas Hedger^{1

2}, Carolyn B McNabb³, Gabriella M K Rossetti^{1

2}, Anastasia Christakou^{1

2}

Affiliations

¹ Centre for Integrative Neuroscience and Neurodynamics, University of Reading, Reading, United Kingdom.
² School of Psychology and Clinical Language Sciences, University of Reading, Reading, United Kingdom.
³ Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom.

Abstract

Quality control is a critical step in the processing and analysis of functional magnetic resonance imaging data. Its purpose is to remove problematic data that could otherwise lead to downstream errors in the analysis and reporting of results. The manual inspection of data can be a laborious and error-prone process that is susceptible to human error. The development of automated tools aims to mitigate these issues. One such tool is pyfMRIqc, which we previously developed as a user-friendly method for assessing data quality. Yet, these methods still generate output that requires subjective interpretations about whether the quality of a given dataset meets an acceptable standard for further analysis. Here we present a quality control protocol using pyfMRIqc and assess the inter-rater reliability of four independent raters using this protocol for data from the fMRI Open QC project (https://osf.io/qaesm/). Data were classified by raters as either "include," "uncertain," or "exclude." There was moderate to substantial agreement between raters for "include" and "exclude," but little to no agreement for "uncertain." In most cases only a single rater used the "uncertain" classification for a given participant's data, with the remaining raters showing agreement for "include"/"exclude" decisions in all but one case. We suggest several approaches to increase rater agreement and reduce disagreement for "uncertain" cases, aiding classification consistency.

Keywords: fMRI; inter-rater reliability; quality control; resting state fMRI; task fMRI.