Factors affecting inter-rater agreement in human classification of eye movements: a comparison of three datasets

Lee Friedman; Vladyslav Prokopenko; Shagen Djanian; Dmytro Katrychuk; Oleg V Komogortsev

doi:10.3758/s13428-021-01782-4

Factors affecting inter-rater agreement in human classification of eye movements: a comparison of three datasets

Behav Res Methods. 2023 Jan;55(1):417-427. doi: 10.3758/s13428-021-01782-4. Epub 2022 Apr 11.

Authors

Lee Friedman¹, Vladyslav Prokopenko², Shagen Djanian^{2

3}, Dmytro Katrychuk², Oleg V Komogortsev²

Affiliations

¹ Derrick M5, Department of Computer Science, Texas State University, 601 University Drive, San Marcos, Texas, 78640, USA. lfriedman10@gmail.com.
² Derrick M5, Department of Computer Science, Texas State University, 601 University Drive, San Marcos, Texas, 78640, USA.
³ Department of Computer Science, Aalborg University, Selma Lagerlofs Vej 300, 9220, Aalborg East, Denmark.

PMID: 35411475
DOI: 10.3758/s13428-021-01782-4

Abstract

Manual classification of eye-movements is used in research and as a basis for comparison with automatic algorithms in the development phase. However, human classification will not be useful if it is unreliable and unrepeatable. Therefore, it is important to know what factors might influence and enhance the accuracy and reliability of human classification of eye-movements. In this report we compare three datasets of human manual classification, two from earlier datasets and one, our own dataset, which we present here for the first time. For inter-rater reliability, we assess both the event-level F1-score and sample-level Cohen's κ, across groups of raters. The report points to several possible influences on human classification reliability: eye-tracker quality, use of head restraint, characteristics of the recorded subjects, the availability of detailed scoring rules, and the characteristics and training of the raters.

Keywords: Cohen’s Kappa; Event-level agreement; Eye-movements; F1-score; Manual classification; Sample-level agreement.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Eye Movements*
Humans
Observer Variation
Reproducibility of Results