A block cipher algorithm identification scheme based on hybrid k-nearest neighbor and random forest algorithm

PeerJ Comput Sci. 2022 Oct 10:8:e1110. doi: 10.7717/peerj-cs.1110. eCollection 2022.

Abstract

Cryptographic algorithm identification, which refers to analyzing and identifying the encryption algorithm used in cryptographic system, is of great significance to cryptanalysis. In order to improve the accuracy of identification work, this article proposes a new ensemble learning-based model named hybrid k-nearest neighbor and random forest (HKNNRF), and constructs a block cipher algorithm identification scheme. In the ciphertext-only scenario, we use NIST randomness test methods to extract ciphertext features, and carry out binary-classification and five-classification experiments on the block cipher algorithms using proposed scheme. Experiments show that when the ciphertext size and other experimental conditions are the same, compared with the baselines, the HKNNRF model has higher classification accuracy. Specifically, the average binary-classification identification accuracy of HKNNRF is 69.5%, which is 13%, 12.5%, and 10% higher than the single-layer support vector machine (SVM), k-nearest neighbor (KNN), and random forest (RF) respectively. The five-classification identification accuracy can reach 34%, which is higher than the 21% accuracy of KNN, the 22% accuracy of RF and the 23% accuracy of SVM respectively under the same experimental conditions.

Keywords: Cryptographic algorithm identification; K-nearest neighbor algorithm; Machine learning; Random forest algorithm; Randomness test.

Grants and funding

This work was supported by the National Natural Science Foundation of China (61972073, 61972215, 62066040); the Natural Science Foundation of Tianjin (20JCZDJC00640); the Key Specialized Research and Development Program of Henan Province (222102210062); the Basic Higher Educational Key Scientific Research Program of Henan Province (22A413004); and the National Innovation Training Program of University Student (202110475072). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.