Impact of sensor data pre-processing strategies and selection of machine learning algorithm on the prediction of metritis events in dairy cattle

Prev Vet Med. 2023 Jun:215:105903. doi: 10.1016/j.prevetmed.2023.105903. Epub 2023 Mar 28.

Abstract

With all the sensor data currently generated at high frequency in dairy farms, there is potential for earlier diagnosis of postpartum diseases compared with traditional monitoring methodologies. Our objectives were 1) to compare the impact of sensor data pre-processing on classifier performance by using multiple time windows before a given metritis event, while considering other cow-level factors and farm-scheduled activities; 2) to compare the performance of random forest (RF), k-nearest neighbors (k-NN), and support vector machine (SVM) classifiers at different decision thresholds using different number of past observations (time-lags) for the detection of behavioral patterns associated with changes in metritis scores; and 3) to compare classifier performance between each one of the five behaviors registered every hour by an ear-tag 3-axis accelerometer (CowManager, Agis Autimatisering, Harmelen, Netherlands). A total of 239 metritis events were created by comparing metritis scores between two consecutive clinical evaluations from cows that were retrospectively selected from a dataset containing sensor data and health information during the first 21 days postpartum from June 2014 to May 2017. Hourly sensor data classified by the accelerometer as either ruminating, eating, not active (including both standing or lying), and two different levels of activity (active and high activity) behaviors corresponding to the 3 days before each metritis event were aggregated every 24-, 12-, 6-, and 3-hour time windows. Multiple time-lags were also used to determine the optimal number of past observations needed for optimal classification. Similarly, different decision thresholds were compared in terms of model performance. Depending on the classifier, algorithm hyperparameters were optimized using grid search (RF, k-NN, SVM) and random search (RF). All behaviors changed throughout the study period and showed distinct daily patterns. From the three algorithms, RF had the highest F1 score followed by k-NN and SVM. Furthermore, sensor data aggregated every 6- or 12-h time windows had the best model performance at multiple time-lags. We concluded that the data from the first 3 days post-partum should be discarded when studying metritis, and either one of the five behaviors measured with CowManager could be used when predicting metritis when sensor data were aggregated every 6- or 12-hour time windows, and using time-lags corresponding to 2-3 days before a given event, depending on the time window used. This study shows how to maximize sensor data in their potential for disease prediction, enhancing the performance of algorithms used in machine learning.

Keywords: Classification algorithms; Dairy cattle behavior; Postpartum period; Precision dairy technology; Predictive modeling.

MeSH terms

  • Algorithms
  • Animals
  • Cattle
  • Cattle Diseases* / diagnosis
  • Eating
  • Female
  • Machine Learning
  • Postpartum Period*
  • Retrospective Studies