Determining the most important physiological and agronomic traits contributing to maize grain yield through machine learning algorithms: a new avenue in intelligent agriculture

PLoS One. 2014 May 15;9(5):e97288. doi: 10.1371/journal.pone.0097288. eCollection 2014.

Abstract

Prediction is an attempt to accurately forecast the outcome of a specific situation while using input information obtained from a set of variables that potentially describe the situation. They can be used to project physiological and agronomic processes; regarding this fact, agronomic traits such as yield can be affected by a large number of variables. In this study, we analyzed a large number of physiological and agronomic traits by screening, clustering, and decision tree models to select the most relevant factors for the prospect of accurately increasing maize grain yield. Decision tree models (with nearly the same performance evaluation) were the most useful tools in understanding the underlying relationships in physiological and agronomic features for selecting the most important and relevant traits (sowing date-location, kernel number per ear, maximum water content, kernel weight, and season duration) corresponding to the maize grain yield. In particular, decision tree generated by C&RT algorithm was the best model for yield prediction based on physiological and agronomical traits which can be extensively employed in future breeding programs. No significant differences in the decision tree models were found when feature selection filtering on data were used, but positive feature selection effect observed in clustering models. Finally, the results showed that the proposed model techniques are useful tools for crop physiologists to search through large datasets seeking patterns for the physiological and agronomic factors, and may assist the selection of the most important traits for the individual site and field. In particular, decision tree models are method of choice with the capability of illustrating different pathways of yield increase in breeding programs, governed by their hierarchy structure of feature ranking as well as pattern discovery via various combinations of features.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Agriculture / methods*
  • Algorithms
  • Artificial Intelligence*
  • Cluster Analysis
  • Data Mining / methods
  • Decision Trees
  • Genes, Plant
  • Seeds / physiology*
  • Zea mays / genetics*

Grants and funding

This study was supported by the grant (AGR-emam) of Shiraz University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.