The raters' differences in Arabic writing rubrics through the Many-Facet Rasch measurement model

Harun Baharudin; Zunita Mohamad Maskor; Mohd Effendi Ewan Mohd Matore

doi:10.3389/fpsyg.2022.988272

The raters' differences in Arabic writing rubrics through the Many-Facet Rasch measurement model

Front Psychol. 2022 Dec 16:13:988272. doi: 10.3389/fpsyg.2022.988272. eCollection 2022.

Authors

Harun Baharudin¹, Zunita Mohamad Maskor², Mohd Effendi Ewan Mohd Matore³

Affiliations

¹ Faculty of Education, Center of Diversity and Education, Universiti Kebangsaan Malaysia (UKM), Selangor, Malaysia.
² SMK Khir Johari, Tanjong Malim, Perak, Malaysia.
³ Faculty of Education, Research Centre of Education Leadership and Policy, Universiti Kebangsaan Malaysia (UKM), Selangor, Malaysia.

Abstract

Writing assessment relies closely on scoring the excellence of a subject's thoughts. This creates a faceted measurement structure regarding rubrics, tasks, and raters. Nevertheless, most studies did not consider the differences among raters systematically. This study examines the raters' differences in association with the reliability and validity of writing rubrics using the Many-Facet Rasch measurement model (MFRM) to model these differences. A set of standards for evaluating the quality of rating based on writing assessment was examined. Rating quality was tested within four writing domains from an analytic rubric using a scale of one to three. The writing domains explored were vocabulary, grammar, language, use, and organization; whereas the data were obtained from 15 Arabic essays gathered from religious secondary school students under the supervision of the Malaysia Ministry of Education. Five raters in the field of practice were selected to evaluate all the essays. As a result, (a) raters range considerably on the lenient-severity dimension, so rater variations ought to be modeled; (b) the combination of findings between raters avoids the doubt of scores, thereby reducing the measurement error which could lower the criterion validity with the external variable; and (c) MFRM adjustments effectively increased the correlations of the scores obtained from partial and full data. Predominant findings revealed that rating quality varies across analytic rubric domains. This also depicts that MFRM is an effective way to model rater differences and evaluate the validity and reliability of writing rubrics.

Keywords: Arabic essays validation; Many-Facet Rasch measurement model (MFRM); analytic rubric; raters; writing assessment; writing domains.