HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images

BMC Med Imaging. 2023 Sep 18;23(1):134. doi: 10.1186/s12880-023-01078-3.

Abstract

Continuous release of image databases with fully or partially identical inner categories dramatically deteriorates the production of autonomous Computer-Aided Diagnostics (CAD) systems for true comprehensive medical diagnostics. The first challenge is the frequent massive bulk release of medical image databases, which often suffer from two common drawbacks: image duplication and corruption. The many subsequent releases of the same data with the same classes or categories come with no clear evidence of success in the concatenation of those identical classes among image databases. This issue stands as a stumbling block in the path of hypothesis-based experiments for the production of a single learning model that can successfully classify all of them correctly. Removing redundant data, enhancing performance, and optimizing energy resources are among the most challenging aspects. In this article, we propose a global data aggregation scale model that incorporates six image databases selected from specific global resources. The proposed valid learner is based on training all the unique patterns within any given data release, thereby creating a unique dataset hypothetically. The Hash MD5 algorithm (MD5) generates a unique hash value for each image, making it suitable for duplication removal. The T-Distributed Stochastic Neighbor Embedding (t-SNE), with a tunable perplexity parameter, can represent data dimensions. Both the Hash MD5 and t-SNE algorithms are applied recursively, producing a balanced and uniform database containing equal samples per category: normal, pneumonia, and Coronavirus Disease of 2019 (COVID-19). We evaluated the performance of all proposed data and the new automated version using the Inception V3 pre-trained model with various evaluation metrics. The performance outcome of the proposed scale model showed more respectable results than traditional data aggregation, achieving a high accuracy of 98.48%, along with high precision, recall, and F1-score. The results have been proved through a statistical t-test, yielding t-values and p-values. It's important to emphasize that all t-values are undeniably significant, and the p-values provide irrefutable evidence against the null hypothesis. Furthermore, it's noteworthy that the Final dataset outperformed all other datasets across all metric values when diagnosing various lung infections with the same factors.

Keywords: COVID-19; Coronavirus; Data aggregation; Inception V3; MD5; Model production; Transfer Learning; X-ray; t-SNE.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • COVID-19* / diagnostic imaging
  • Humans
  • Lung / diagnostic imaging
  • Pneumonia* / diagnostic imaging
  • X-Rays