A Bayesian approach for analysis of ordered categorical responses subject to misclassification

Ashley Ling; El Hamidi Hay; Samuel E Aggrey; Romdhane Rekaya

doi:10.1371/journal.pone.0208433

A Bayesian approach for analysis of ordered categorical responses subject to misclassification

PLoS One. 2018 Dec 13;13(12):e0208433. doi: 10.1371/journal.pone.0208433. eCollection 2018.

Authors

Ashley Ling¹, El Hamidi Hay², Samuel E Aggrey^{3

4}, Romdhane Rekaya^{1

3

5}

Affiliations

¹ Department of Anismal and Dairy Science, University of Georgia, Athens, Georgia, United States of America.
² USDA Agricultural Research Service, Fort Keogh Livestock and Range Research Laboratory, Miles City, Montana, United States of America.
³ Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America.
⁴ Department of Poultry Science, University of Georgia, Athens, Georgia, United States of America.
⁵ Department of Statistics, University of Georgia, Athens, Georgia, United States of America.

Abstract

Ordinal categorical responses are frequently collected in survey studies, human medicine, and animal and plant improvement programs, just to mention a few. Errors in this type of data are neither rare nor easy to detect. These errors tend to bias the inference, reduce the statistical power and ultimately the efficiency of the decision-making process. Contrarily to the binary situation where misclassification occurs between two response classes, noise in ordinal categorical data is more complex due to the increased number of categories, diversity and asymmetry of errors. Although several approaches have been presented for dealing with misclassification in binary data, only limited practical methods have been proposed to analyze noisy categorical responses. A latent variable model implemented within a Bayesian framework was proposed to analyze ordinal categorical data subject to misclassification using simulated and real datasets. The simulated scenario consisted of a discrete response with three categories and a symmetric error rate of 5% between any two classes. The real data consisted of calving ease records of beef cows. Using real and simulated data, ignoring misclassification resulted in substantial bias in the estimation of genetic parameters and reduction of the accuracy of predicted breeding values. Using our proposed approach, a significant reduction in bias and increase in accuracy ranging from 11% to 17% was observed. Furthermore, most of the misclassified observations (in the simulated data) were identified with a substantially higher probability. Similar results were observed for a scenario with asymmetric misclassification. While the extension to traits with more categories between adjacent classes is straightforward, it could be computationally costly. For traits with high heritability, the performance of the methodology would be expected to improve.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Animals
Bayes Theorem
Bias
Body Weight / physiology
Breeding / methods
Breeding / statistics & numerical data*
Cattle* / classification
Cattle* / genetics
Datasets as Topic / classification
Datasets as Topic / statistics & numerical data
Female
Genetic Association Studies / statistics & numerical data
Genetic Association Studies / veterinary
Markov Chains
Meat / statistics & numerical data
Models, Statistical*
Parturition / physiology
Phenotype
Physical Fitness
Pregnancy
Quantitative Trait, Heritable

Grants and funding

AL was funded by the United States Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) through the National Needs Grant, grant number 11754154 to RR, https://nifa.usda.gov/. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.