Presentation of a model-based data mining to predict lung cancer

J Res Health Sci. 2015 Summer;15(3):189-95.

Abstract

Background: The data related to patients often have very useful information that can help us to resolve a lot of problems and difficulties in different areas. This study was performed to present a model-based data mining to predict lung cancer in 2014.

Methods: In this exploratory and modeling study, information was collected by two methods library and field methods. All gathered variables were in the format of form of data transferring from those affected by pulmonary problems (303 records) as well as 26 fields including clinical and environmental variables. The validity of form of data transferring was obtained via consensus and meeting group method using purposive sampling through several meetings among members of research group and lung group. The methodology used was based on classification and prediction method of data mining as well as the method of supervision with algorithms of classification and regression tree using Clementine 12 software.

Results: For clinical variables, model's precision was high in three parts of training, test and validation. For environmental variables, maximum precision of model in training part relevant to C&R algorithm was equal to 76%, in test part relevant to Neural Net algorithm was equal to 61%, and in validation part relevant to Neural Net algorithm was equal to 57%.

Conclusion: In clinical variables, C5.0, CHAID, C & R models were stable and suitable for detection of lung cancer. In addition, in environmental variables, C & R model was stable and suitable for detection of lung cancer. Variables such as pulmonary nodules, effusion of plural fluid, diameter of pulmonary nodules, and place of pulmonary nodules are very important variables that have the greatest impact on detection of lung cancer.

Keywords: Data Mining; Decision Tree; Lung Cancer; Neural Networks.

MeSH terms

  • Data Mining*
  • Forecasting
  • Humans
  • Lung Neoplasms*
  • Models, Theoretical*