Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

PLoS One. 2015 Dec 21;10(12):e0144370. doi: 10.1371/journal.pone.0144370. eCollection 2015.

Abstract

It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

MeSH terms

  • Models, Genetic*
  • Plant Breeding*
  • Plants / genetics*

Grants and funding

The principal supervisor (KB) of TT's research higher degree (PhD) study has a research collaboration with Monsanto Company for unrelated activities. TT receives a living allowance from these funds. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.