Does modeling causal relationships improve the accuracy of predicting lactation milk yields?

JDS Commun. 2023 Jul 21;4(5):358-362. doi: 10.3168/jdsc.2022-0343. eCollection 2023 Sep.

Abstract

This study compared 3 correlational (best prediction, linear regression, and feed-forward neural networks) and 2 causal models (recursive structural equation model and recurrent neural networks) for estimating lactation milk yields. The correlational models assumed associations between test-day milk yields (health conditions), while the casual models postulated unidirectional recursive effects between these test-day variables. Wood lactation curves were used to simulate the data and served as a benchmark model. Individual Wood lactation curves provided an excellent parametric interpretation of lactation dynamics, with their prediction accuracies depending on the coverage of the lactation curve dynamics. Best prediction outperformed other models in the absence of mastitis but was suboptimal when mastitis was present and unaccounted for. Recurrent neural networks yielded the highest accuracy when mastitis was present. Although causal models facilitated the inference about the causality underlying lactation, precisely capturing the causal relationships was challenging because the underlying biology was complex. Misspecification of recursive effects in the recursive structural equation model resulted in a loss of accuracy. Hence, modeling causal relationships does not necessarily guarantee improved accuracies. In practice, a parsimonious model is preferred, balancing model complexity and accuracy. In addition to the choice of statistical models, the proper accounting for factors and covariates affecting milk yields is equally crucial.