Missing value imputation strategies for metabolomics data

Emily Grace Armitage; Joanna Godzien; Vanesa Alonso-Herranz; Ángeles López-Gonzálvez; Coral Barbas

doi:10.1002/elps.201500352

Missing value imputation strategies for metabolomics data

Electrophoresis. 2015 Dec;36(24):3050-60. doi: 10.1002/elps.201500352. Epub 2015 Oct 20.

Authors

Emily Grace Armitage¹, Joanna Godzien¹, Vanesa Alonso-Herranz¹, Ángeles López-Gonzálvez¹, Coral Barbas¹

Affiliation

¹ Centre for Metabolomics and Bioanalysis (CEMBIO), Facultad de Farmacia, Universidad CEU San Pablo, Madrid, Spain.

PMID: 26376450
DOI: 10.1002/elps.201500352

Abstract

The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros.

Keywords: CE-MS; Data; False-discovery rate; Imputation; Metabolomics; Missing values; k-nearest neighbour.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Biomarkers / blood
Electrophoresis, Capillary / methods*
Humans
Mass Spectrometry / methods*
Metabolome
Metabolomics / methods*

Substances

Biomarkers