Application of data mining techniques and data analysis methods to measure cancer morbidity and mortality data in a regional cancer registry: The case of the island of Crete, Greece

Comput Methods Programs Biomed. 2017 Jul:145:73-83. doi: 10.1016/j.cmpb.2017.04.011. Epub 2017 Apr 13.

Abstract

Background and objective: Micro or macro-level mapping of cancer statistics is a challenging task that requires long-term planning, prospective studies and continuous monitoring of all cancer cases. The objective of the current study is to present how cancer registry data could be processed using data mining techniques in order to improve the statistical analysis outcomes.

Methods: Data were collected from the Cancer Registry of Crete in Greece (counties of Rethymno and Lasithi) for the period 1998-2004. Data collection was performed on paper forms and manually transcribed to a single data file, thus introducing errors and noise (e.g. missing and erroneous values, duplicate entries etc.). Data were pre-processed and prepared for analysis using data mining tools and algorithms. Feature selection was applied to evaluate the contribution of each collected feature in predicting patients' survival. Several classifiers were trained and evaluated for their ability to predict survival of patients. Finally, statistical analysis of cancer morbidity and mortality rates in the two regions was performed in order to validate the initial findings.

Results: Several critical points in the process of data collection, preprocessing and analysis of cancer data were derived from the results, while a road-map for future population data studies was developed. In addition, increased morbidity rates were observed in the counties of Crete (Age Standardized Morbidity/Incidence Rates ASIR= 396.45 ± 2.89 and 274.77 ±2.48 for men and women, respectively) compared to European and world averages (ASIR= 281.6 and 207.3 for men and women in Europe and 203.8 and 165.1 in world level). Significant variation in cancer types between sexes and age groups (the ratio between deaths and reported cases for young patients, less than 34 years old, is at 0.055 when the respective ratio for patients over 75 years old is 0.366) was also observed.

Conclusions: This study introduced a methodology for preprocessing and analyzing cancer data, using a combination of data mining techniques that could be a useful tool for other researchers and further enhancement of the cancer registries.

Keywords: Cancer data; Crete; Data mining; Feature selection; Greece.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Algorithms
  • Data Mining*
  • Female
  • Greece / epidemiology
  • Humans
  • Incidence
  • Male
  • Middle Aged
  • Morbidity
  • Neoplasms / mortality*
  • Prospective Studies
  • Registries
  • Young Adult