Unsatisfactory reproducibility of interstitial inflammation scoring in allograft kidney biopsy

Shun-Chen Huang; Yi-Jia Lin; Mei-Chin Wen; Wei-Chou Lin; Pei-Wei Fang; Peir-In Liang; Hao-Wen Chuang; Hui-Ping Chien; Tai-Di Chen

doi:10.1038/s41598-023-33908-3

Unsatisfactory reproducibility of interstitial inflammation scoring in allograft kidney biopsy

Sci Rep. 2023 May 1;13(1):7095. doi: 10.1038/s41598-023-33908-3.

Authors

Shun-Chen Huang^#¹, Yi-Jia Lin^#², Mei-Chin Wen³, Wei-Chou Lin⁴, Pei-Wei Fang⁵, Peir-In Liang⁶, Hao-Wen Chuang⁷, Hui-Ping Chien⁸, Tai-Di Chen⁹

Affiliations

¹ Department of Anatomic Pathology, Chang Gung Memorial Hospital Kaohsiung Branch, Kaohsiung, Taiwan.
² Department of Pathology, Tri-service General Hospital, National Defense Medical Center, Taipei, Taiwan.
³ Department of Pathology, China Medical University Hsinchu Hospital, Hsinchu, Taiwan.
⁴ Department of Pathology, National Taiwan University Hospital, Taipei, Taiwan.
⁵ Department of Pathology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan.
⁶ Department of Pathology, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan.
⁷ Department of Pathology and Laboratory Medicine, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan.
⁸ Department of Pathology and Laboratory Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan.
⁹ Department of Anatomic Pathology, Chang Gung Memorial Hospital Linkou Main Branch, Taoyuan, Taiwan. b8902028@msn.com.

^# Contributed equally.

Abstract

Interstitial inflammation scoring is incorporated into the Banff Classification of Renal Allograft Pathology and is essential for the diagnosis of T-cell mediated rejection. However, its reproducibility, including inter-rater and intra-rater reliabilities, has not been carefully investigated. In this study, eight renal pathologists from different hospitals independently scored 45 kidney allograft biopsies with varying extents of interstitial inflammation. Inter-rater reliabilities and intra-rater reliabilities were investigated by kappa statistics and conditional agreement probabilities. Individual pathologists' scoring patterns were examined by chi-squared tests and proportions tests. The mean pairwise kappa values for inter-rater reliability were 0.27, 0.30, and 0.26 for the Banff i score, ti score, and i-IFTA, respectively. No rater pair performed consistently better or worse than others on all three scorings. After dichotomizing the scores into two groups (none/mild and moderate/severe inflammation), the averaged conditional agreements ranged from 47.1% to 50.0%. The distributions of the scores differed, but some pathologists persistently scored higher or lower than others. Given the important role of interstitial inflammation scoring in the diagnosis of T-cell mediated rejection, transplant practitioners should be aware of the possible clinical implications of the far-from-optimal reproducibility.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Allografts
Biopsy
Graft Rejection / pathology
Humans
Inflammation / pathology
Kidney / pathology
Kidney Transplantation*
Reproducibility of Results