[The value of CT radiomics in the prediction of EGFR mutation in lung cancer]

Zhonghua Yi Xue Za Zhi. 2020 Mar 10;100(9):690-695. doi: 10.3760/cma.j.issn.0376-2491.2020.09.009.
[Article in Chinese]

Abstract

Objective: To explore the value of CT radiomics quantitative features in the prediction of epidermal growth factor receptor (EGFR) mutation in lung cancer. Methods: The data of 144 patients, 75 males, 69 females, median age 54 (25-68 years), with EGFR gene test results in lung cancers diagnosed in the First Affiliated Hospital of Soochow University were retrospectively analyzed, including 81 patients, 39 males, 42 females, median age 52 (25-64)years old, with EGFR mutations and 63 patients,36 males,27 females,median age 56(32-68) years old,with EGFR wild types. According to a ratio of 2︰1, patients were randomly assigned to the training group and validation group. MaZda software was used to extract radiomics features including the gray level histogram (GLH), absolute gradient (GRA), gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), auto-regressive model (ARM) and wavelets transform (WAV), and so on. Fisher coefficients (Fisher), classification error probability combined average correlation coefficients (POE+ACC) and mutual information (MI) were used to select 10 optimal features making up the optimal feature subsets. The optimal feature subsets were analyzed by using linear discriminant analysis (LDA) and nonlinear discriminant analysis (NDA) to calculate the accuracy, sensitivity and specificity in the differential diagnosis of EGFR mutant types and wild types in lung cancers. The prediction model was established using the optimal feature subsets with the highest accuracy in the training group with artificial neural network (ANN). The established prediction model was used to differentiate EGFR mutant types from wild types in the validation group. Results: MaZda software extracted a total of 301 quantitative features in the CT images for the patients with EGFR mutant types and EGFR wild types in the training group. The optimal feature subsets obtained from Fisher-NDA and (POE+ACC)-NDA had the highest accuracy of 93.8%, in the differential diagnosis of the EGFR mutant types and EGFR wild types of lung cancer in the training group. The optimal feature subset prediction model obtained from Fisher-NDA had the accuracy, sensitivity and specificity of 83.3%, 86.7% and 77.8%, respectively, in the differential diagnosis of the EGFR mutant types and EGFR wild types of lung cancer in the validation group. Conclusion: The optimal subset of CT radiomics features has high accuracy in predicting EGFR mutations in lung cancer, providing a new method for predicting gene expression of lung cancer.

目的: 探讨CT影像组学定量特征在预测肺癌表皮生长因子受体(EGFR)突变中的价值。 方法: 回顾性分析2013年9月至2018年10月在苏州大学附属第一医院确诊的144例有EGFR基因检测结果的肺癌患者的资料,男75例、女69例,中位年龄54(25~68)岁。其中,EGFR突变型81例,男39例、女42例,中位年龄52(25~64)岁;EGFR野生型63例,男36例、女27例,中位年龄56(32~68)岁。按照2∶1的比例随机分配为训练组和验证组。利用MaZda软件提取影像组学特征包括灰度直方图(GLH)、绝对梯度(GRA)、灰度共生矩阵(GLCM)、灰度游程矩阵(GLRLM)、自回归模型(ARM)和小波变换(WAV)等特征。采用费希尔参数法(Fisher)、分类错误率联合平均相关系数法(POE+ACC)和相关信息测度法(MI)3种特征选择方法对提取的定量特征进行筛选,分别选择10个相关的最优特征,得到最优特征子集。然后用线性判别分析法(LDA)和非线性判别分析法(NDA)对三组最优特征子集进行分析,计算出其鉴别肺癌EGFR突变型与野生型的准确度、敏感度和特异度,利用人工神经网络(ANN)对训练组准确度最高的最优特征子集建立预测模型,并利用建立的预测模型,对验证组肺癌EGFR突变型与野生型进行鉴别诊断。 结果: MaZda软件提取训练组肺癌EGFR突变型与野生型图像定量特征,一共301个。Fisher-NDA和(POE+ACC)-NDA法选择的最优特征子集鉴别肺癌EGFR突变型与野生型的准确度最高,为93.8%。Fisher-NDA法最优特征子集预测模型鉴别验证组中肺癌EGFR突变型与野生型的准确度、敏感度和特异度分别为83.3%、86.7%和77.8%。 结论: CT影像组学最优特征子集在预测肺癌EGFR突变中有较高的准确度,为预测肺癌基因表达提供了一种新的方法。.

Keywords: Lung neoplasms; Radiomics; Receptor, epidermal growth factor; Tomography, X-ray computed.

MeSH terms

  • Adult
  • Aged
  • ErbB Receptors / genetics
  • Female
  • Humans
  • Lung Neoplasms* / genetics
  • Male
  • Middle Aged
  • Mutation
  • Retrospective Studies
  • Tomography, X-Ray Computed*

Substances

  • EGFR protein, human
  • ErbB Receptors