Discovery of Knowledge in the Incidence of a Type of Lung Cancer for Patients through Data Mining Models

Yousif Saleh Ibrahim; Yasser Muhammed; Asaad T Al-Douri; Muhammad Shahzad Faisal; Abdulsattar Abdullah H Mohamad; Abdallah Al-Husban; Mequanint Birhan

doi:10.1155/2022/6058213

Discovery of Knowledge in the Incidence of a Type of Lung Cancer for Patients through Data Mining Models

Comput Intell Neurosci. 2022 May 31:2022:6058213. doi: 10.1155/2022/6058213. eCollection 2022.

Authors

Yousif Saleh Ibrahim¹, Yasser Muhammed², Asaad T Al-Douri³, Muhammad Shahzad Faisal⁴, Abdulsattar Abdullah H Mohamad^{5

6}, Abdallah Al-Husban^{7

8}, Mequanint Birhan⁹

Affiliations

¹ Department of Medical Laboratory Techniques, Al-Maarif University College, Al-Anbar, Iraq.
² College of Technical Engineering, Al-Farahidi University, Baghdad, Iraq.
³ Department of Dental Industry, College of Medical Technology, Al-Kitab University, Altun Kupri, Iraq.
⁴ COMSATS University Islamabad, Attock Campus, Punjab, Pakistan.
⁵ The University of Mashreq, Research Center, Baghdad, Iraq.
⁶ Department of Medical Laboratory Techniques, Dijlah University College, Baghdad 10021, Iraq.
⁷ Department of Mathematics, Faculty of Science and Technology, Irbid, Jordan.
⁸ National University, P.O. Box: 2600, Irbid, Jordan.
⁹ Department of Mechanical Engineering, Mizan-Tepi University, Tepi, Ethiopia.

Abstract

This paper presents the research results on the contribution of user-centered data mining based on the standard principles, focusing on the analysis of survival and mortality of lung cancer cases. Researchers used anonymized data from previously diagnosed instances in the health database to predict the condition of new patients who have not had their results yet. Medical professionals specializing in this field provided feedback on the usefulness of the new software, which was constructed using WEKA data mining tools and the Naive Bayes method. The results of this article provide elements of interest to discuss the value of identifying or discovering relationships in apparently "hidden" information to propose strategies to counteract health problems or prevent future complications and thus contribute to improving the quality of care. Life of the population, as would be the case of data mining in the health area, has shown applicability in the early detection and prevention of diseases for the analysis of genetic markers to determine the probability of a satisfactory response to medical treatment, and the most accurate model was Naive Bayes (91.1%). The Naive Bayes algorithm's closest competitor, bagging, came in second with 90.8%. The analysis found that the ZeroR algorithm had the lowest success rate at 80%.

MeSH terms

Algorithms
Bayes Theorem
Data Mining* / methods
Humans
Incidence
Lung Neoplasms* / epidemiology