Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction

Shumaila Hussain; Muhammad Nadeem; Junaid Baber; Mohammed Hamdi; Adel Rajab; Mana Saleh Al Reshan; Asadullah Shaikh

doi:10.1038/s41598-024-56871-z

Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction

Sci Rep. 2024 Mar 28;14(1):7406. doi: 10.1038/s41598-024-56871-z.

Authors

Shumaila Hussain^{1

2}, Muhammad Nadeem³, Junaid Baber^{4

5}, Mohammed Hamdi⁶, Adel Rajab⁶, Mana Saleh Al Reshan⁷, Asadullah Shaikh⁷

Affiliations

¹ Department of Computer Science, Sardar Bahadur Khan Women's University, Quetta, Pakistan. shumaila.hussain@sbkwu.edu.pk.
² Department of Computer Science and IT, University of Balochistan, Quetta, Pakistan. shumaila.hussain@sbkwu.edu.pk.
³ Higher Colleges of Technology, Abu Dhabi, United Arab Emirates.
⁴ Department of Computer Science and IT, University of Balochistan, Quetta, Pakistan.
⁵ GIPSA-Lab, University Grenoble Alpes, 38000, Grenoble, France.
⁶ Department of Computer Science, College of Computer Science and Information Systems, Najran University, 61441, Najran, Saudi Arabia.
⁷ Department of Information Systems, College of Computer Science and Information Systems, Najran University, 61441, Najran, Saudi Arabia.

Abstract

Software vulnerabilities pose a significant threat to system security, necessitating effective automatic detection methods. Current techniques face challenges such as dependency issues, language bias, and coarse detection granularity. This study presents a novel deep learning-based vulnerability detection system for Java code. Leveraging hybrid feature extraction through graph and sequence-based techniques enhances semantic and syntactic understanding. The system utilizes control flow graphs (CFG), abstract syntax trees (AST), program dependencies (PD), and greedy longest-match first vectorization for graph representation. A hybrid neural network (GCN-RFEMLP) and the pre-trained CodeBERT model extract features, feeding them into a quantum convolutional neural network with self-attentive pooling. The system addresses issues like long-term information dependency and coarse detection granularity, employing intermediate code representation and inter-procedural slice code. To mitigate language bias, a benchmark software assurance reference dataset is employed. Evaluations demonstrate the system's superiority, achieving 99.2% accuracy in detecting vulnerabilities, outperforming benchmark methods. The proposed approach comprehensively addresses vulnerabilities, including improper input validation, missing authorizations, buffer overflow, cross-site scripting, and SQL injection attacks listed by common weakness enumeration (CWE).

Keywords: CodeBERT; Feature extraction; Hybrid GCN; Self-attentive QCNN; Software security; Vulnerability detection.

Grants and funding

NU/RG/SERC/12/34/The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Groups Funding Program grant code