Comparison of unweighted and item response theory-based weighted sum scoring for the Nine-Questions Depression-Rating Scale in the Northern Thai Dialect

BMC Med Res Methodol. 2022 Oct 12;22(1):268. doi: 10.1186/s12874-022-01744-0.

Abstract

Background: The Nine-Questions Depression-Rating Scale (9Q) has been developed as an alternative assessment tool for assessing the severity of depressive symptoms in Thai adults. The traditional unweighted sum scoring approach does not account for differences in the loadings of the items on the actual severity. Therefore, we developed an Item Response Theory (IRT)-based weighted sum scoring approach to provide a scoring method that is more precise than the unweighted sum score.

Methods: Secondary data from a study on the criterion-related validity of the 9Q in the northern Thai dialect was used in this study. All participants were interviewed to obtain demographic data and screened/evaluated for major depressive disorder and the severity of the associated depressive symptoms, followed by diagnosis by a psychiatrist for major depressive disorder. IRT models were used to estimate the discrimination and threshold parameters. Differential item functioning (DIF) of responses to each item between males and females was compared using likelihood-ratio tests. The IRT-based weighed sum scores of the individual items are defined as the linear combination of individual response weighted with the discrimination and threshold parameters divided by the plausible maximum score based on the graded-response model (GRM) for the 9Q score (9Q-GRM) or the nominal-response model (NRM) for categorical combinations of the intensity and frequency of symptoms from the 9Q responses (9QSF-NRM). The performances of the two scoring procedures were compared using relative precision.

Results: Of the 1,355 participants, 1,000 and 355 participants were randomly selected for the developmental and validation group for the IRT-based weighted scoring, respectively. the gender-related DIF were presented for items 2 and 5 for the 9Q-GRM, while most items (except for items 3 and 6) for the 9QSF-NRM, which could be used to separately estimate the parameters between genders. The 9Q-GRM model accounting for DIF had a higher precision (16.7%) than the unweighted sum-score approach.

Discussion: Our findings suggest that weighted sum scoring with the IRT parameters can improve the scoring when using 9Q to measure the severity of the depressive symptoms in Thai adults. Accounting for DIF between the genders resulted in higher precision for IRT-based weighted scoring.

Keywords: Depressive symptoms; Differential item functioning; Graded-response model; Item response theory; Nine-Questions Depression-Rating Scale; Nominal-response model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Depression* / diagnosis
  • Depressive Disorder, Major* / diagnosis
  • Female
  • Humans
  • Language
  • Male
  • Psychometrics
  • Research Design
  • Thailand