Application of data mining techniques to explore predictors of HCC in Egyptian patients with HCV-related chronic liver disease

Asian Pac J Cancer Prev. 2015;16(1):381-5. doi: 10.7314/apjcp.2015.16.1.381.

Abstract

Background: Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data mining is a method of predictive analysis which can explore tremendous volumes of information to discover hidden patterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Such an algorithm should be economical, reliable, easy to apply and acceptable by domain experts.

Methods: This cross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135 HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis, we constructed a decision tree learning algorithm to predict HCC.

Results: The decision tree algorithm was able to predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data. The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%). Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ≥50.3 ng/ml was selected as the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites were variables associated with HCC.

Conclusion: Data mining analysis allows discovery of hidden patterns and enables the development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. This study has highlighted a new cutoff for AFP (≥50.3 ng/ml). Presence of a score of >2 risk variables (out of 5) can successfully predict HCC with a sensitivity of 96% and specificity of 82%.

MeSH terms

  • Age Factors
  • Algorithms
  • Biomarkers, Tumor / blood*
  • Carcinoma, Hepatocellular / diagnosis
  • Carcinoma, Hepatocellular / epidemiology*
  • Carcinoma, Hepatocellular / virology
  • Computational Biology
  • Cross-Sectional Studies
  • Data Mining / methods*
  • Decision Trees
  • Early Diagnosis
  • Egypt / epidemiology
  • Female
  • Hepacivirus
  • Hepatitis C, Chronic / complications
  • Humans
  • Liver Cirrhosis
  • Liver Neoplasms / diagnosis
  • Liver Neoplasms / epidemiology*
  • Liver Neoplasms / virology
  • Male
  • Middle Aged
  • Predictive Value of Tests
  • Sex Factors
  • alpha-Fetoproteins / metabolism*

Substances

  • Biomarkers, Tumor
  • alpha-Fetoproteins