Does modeling causal relationships improve the accuracy of predicting lactation milk yields?

Xiao-Lin Wu; Asha M Miles; Curtis P Van Tassell; George R Wiggans; H Duane Norman; Ransom L Baldwin; Javier Burchard; João Dürr

doi:10.3168/jdsc.2022-0343

Does modeling causal relationships improve the accuracy of predicting lactation milk yields?

JDS Commun. 2023 Jul 21;4(5):358-362. doi: 10.3168/jdsc.2022-0343. eCollection 2023 Sep.

Authors

Xiao-Lin Wu^{1

2}, Asha M Miles³, Curtis P Van Tassell³, George R Wiggans¹, H Duane Norman¹, Ransom L Baldwin³, Javier Burchard¹, João Dürr¹

Affiliations

¹ Council on Dairy Cattle Breeding, Bowie, MD 20716.
² Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706.
³ USDA, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705.

Abstract

This study compared 3 correlational (best prediction, linear regression, and feed-forward neural networks) and 2 causal models (recursive structural equation model and recurrent neural networks) for estimating lactation milk yields. The correlational models assumed associations between test-day milk yields (health conditions), while the casual models postulated unidirectional recursive effects between these test-day variables. Wood lactation curves were used to simulate the data and served as a benchmark model. Individual Wood lactation curves provided an excellent parametric interpretation of lactation dynamics, with their prediction accuracies depending on the coverage of the lactation curve dynamics. Best prediction outperformed other models in the absence of mastitis but was suboptimal when mastitis was present and unaccounted for. Recurrent neural networks yielded the highest accuracy when mastitis was present. Although causal models facilitated the inference about the causality underlying lactation, precisely capturing the causal relationships was challenging because the underlying biology was complex. Misspecification of recursive effects in the recursive structural equation model resulted in a loss of accuracy. Hence, modeling causal relationships does not necessarily guarantee improved accuracies. In practice, a parsimonious model is preferred, balancing model complexity and accuracy. In addition to the choice of statistical models, the proper accounting for factors and covariates affecting milk yields is equally crucial.