Xputer: bridging data gaps with NMF, XGBoost, and a streamlined GUI experience

Front Artif Intell. 2024 Apr 24:7:1345179. doi: 10.3389/frai.2024.1345179. eCollection 2024.

Abstract

The rapid proliferation of data across diverse fields has accentuated the importance of accurate imputation for missing values. This task is crucial for ensuring data integrity and deriving meaningful insights. In response to this challenge, we present Xputer, a novel imputation tool that adeptly integrates Non-negative Matrix Factorization (NMF) with the predictive strengths of XGBoost. One of Xputer's standout features is its versatility: it supports zero imputation, enables hyperparameter optimization through Optuna, and allows users to define the number of iterations. For enhanced user experience and accessibility, we have equipped Xputer with an intuitive Graphical User Interface (GUI) ensuring ease of handling, even for those less familiar with computational tools. In performance benchmarks, Xputer often outperforms IterativeImputer in terms of imputation accuracy. Furthermore, Xputer autonomously handles a diverse spectrum of data types, including categorical, continuous, and Boolean, eliminating the need for prior preprocessing. Given its blend of performance, flexibility, and user-friendly design, Xputer emerges as a state-of-the-art solution in the realm of data imputation.

Keywords: ensemble learning; imputation; matrix factorization; mix-type data; tabular data.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by the Crafoord Foundation (JK # 20230775), the Swedish Cancer Society (JK #19 0004 FE and LR #21 1444 Pj), the Swedish Research Council (LR #2021-03055), the Swedish Childhood Cancer Foundation (JK # PR2022-0106), and SUS Stiftelser och Donationer (LR #95512), and Governmental Funding of Clinical Research within the National Health Service (ALF) (LR # 40609).