Age grading An. gambiae and An. arabiensis using near infrared spectra and artificial neural networks

PLoS One. 2019 Aug 14;14(8):e0209451. doi: 10.1371/journal.pone.0209451. eCollection 2019.

Abstract

Background: Near infrared spectroscopy (NIRS) is currently complementing techniques to age-grade mosquitoes. NIRS classifies lab-reared and semi-field raised mosquitoes into < or ≥ 7 days old with an average accuracy of 80%, achieved by training a regression model using partial least squares (PLS) and interpreted as a binary classifier.

Methods and findings: We explore whether using an artificial neural network (ANN) analysis instead of PLS regression improves the current accuracy of NIRS models for age-grading malaria transmitting mosquitoes. We also explore if directly training a binary classifier instead of training a regression model and interpreting it as a binary classifier improves the accuracy. A total of 786 and 870 NIR spectra collected from laboratory reared An. gambiae and An. arabiensis, respectively, were used and pre-processed according to previously published protocols. The ANN regression model scored root mean squared error (RMSE) of 1.6 ± 0.2 for An. gambiae and 2.8 ± 0.2 for An. arabiensis; whereas the PLS regression model scored RMSE of 3.7 ± 0.2 for An. gambiae, and 4.5 ± 0.1 for An. arabiensis. When we interpreted regression models as binary classifiers, the accuracy of the ANN regression model was 93.7 ± 1.0% for An. gambiae, and 90.2 ± 1.7% for An. arabiensis; while PLS regression model scored the accuracy of 83.9 ± 2.3% for An. gambiae, and 80.3 ± 2.1% for An. arabiensis. We also find that a directly trained binary classifier yields higher age estimation accuracy than a regression model interpreted as a binary classifier. A directly trained ANN binary classifier scored an accuracy of 99.4 ± 1.0 for An. gambiae and 99.0 ± 0.6% for An. arabiensis; while a directly trained PLS binary classifier scored 93.6 ± 1.2% for An. gambiae and 88.7 ± 1.1% for An. arabiensis. We further tested the reproducibility of these results on different independent mosquito datasets. ANNs scored higher estimation accuracies than when the same age models are trained using PLS. Regardless of the model architecture, directly trained binary classifiers scored higher accuracies on classifying age of mosquitoes than regression models translated as binary classifiers.

Conclusion: We recommend training models to estimate age of An. arabiensis and An. gambiae using ANN model architectures (especially for datasets with at least 70 mosquitoes per age group) and direct training of binary classifier instead of training a regression model and interpreting it as a binary classifier.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aging*
  • Animals
  • Anopheles / classification
  • Anopheles / physiology*
  • Female
  • Malaria / diagnosis*
  • Malaria / parasitology
  • Male
  • Models, Statistical
  • Neural Networks, Computer*
  • Plasmodium / isolation & purification*
  • Population Density
  • Spectroscopy, Near-Infrared / methods*

Grants and funding

This study was funded by Grand Challenges Canada Stars for Global Health funded by the government of Canada grant 043901 awarded to MTSL and Marquette University Graduate School, for studentship awarded to MPM.