An in silico analytical study of lung cancer and smokers datasets from gene expression omnibus (GEO) for prediction of differentially expressed genes

Bioinformation. 2015 May 28;11(5):229-35. doi: 10.6026/97320630011229. eCollection 2015.

Abstract

Smoking is the leading cause of lung cancer development and several genes have been identified as potential biomarker for lungs cancer. Contributing to the present scientific knowledge of biomarkers for lung cancer two different data sets, i.e. GDS3257 and GDS3054 were downloaded from NCBI׳s GEO database and normalized by RMA and GRMA packages (Bioconductor). Diffrentially expressed genes were extracted by using and were R (3.1.2); DAVID online tool was used for gene annotation and GENE MANIA tool was used for construction of gene regulatory network. Nine smoking independent gene were found whereas average expressions of those genes were almost similar in both the datasets. Five genes among them were found to be associated with cancer subtypes. Thirty smoking specific genes were identified; among those genes eight were associated with cancer sub types. GPR110, IL1RN and HSP90AA1 were found directly associated with lung cancer. SEMA6A differentially expresses in only non-smoking lung cancer samples. FLG is differentially expressed smoking specific gene and is related to onset of various cancer subtypes. Functional annotation and network analysis revealed that FLG participates in various epidermal tissue developmental processes and is co-expressed with other genes. Lung tissues are epidermal tissues and thus it suggests that alteration in FLG may cause lung cancer. We conclude that smoking alters expression of several genes and associated biological pathways during development of lung cancers.