基于Transformer的SM4算法工作模式识别

doi:10.19678/j.issn.1000-3428.0065750

摘要/Abstract

摘要：

密码算法识别是开展密码设备监管、密码分析等工作的前提，在对现有密码算法识别方案进行总结和分析的基础上，利用K近邻(KNN)算法和随机性检测工具分析SM4分组密码算法不同工作模式下密文识别准确率低的原因。针对现有方案在SM4算法多种工作模式密文混合场景下识别准确率低的现状，证明深度学习应用于SM4分组密码算法工作模式识别问题的可行性，提出一种基于Transformer的SM4算法工作模式密文识别方案。在ECB、CBC、CFB、OFB、CTR工作模式下对文件进行批量加密，密文文件经过数据预处理形成密文数据集，然后输入Transformer模型进行五分类识别。实验结果表明，SM4算法5种工作模式在密文混合场景下识别准确率达到94.94%，证明所提方案可有效提升SM4分组密码算法5种工作模式在密文混合场景下的识别准确率。将密文数据集输入卷积神经网络、循环神经网络、ResNet进行对比实验，结果表明，相较于这3种传统神经网络，基于自注意力机制的Transformer模型识别准确率分别提升18.38、26.96、10.44个百分点。

关键词: 密码算法识别, SM4算法, 工作模式, 深度学习, Transformer模型

Abstract:

Cipher algorithm recognition is a prerequisite for cryptanalysis and supervision of cryptographic equipment. Based on the summary and analyses of existing cipher algorithm recognition schemes, this study uses the K-Nearest Neighbor(KNN) algorithm and a randomness detection tool to analyze the reason for the low recognition accuracy of ciphertext in different working modes of the SM4 block cipher algorithm. To address the low recognition accuracy of correct recognition in the existing ciphertext mixed scenario scheme in multiple working modes of the SM4 block cipher algorithm, the study verifies the feasibility of applying deep learning to the working mode recognition problem of the SM4 block cipher algorithm, thereby proposing a recognition scheme based on the Transformer model. The recognition scheme encrypts files in batches in Electronic CodeBook(ECB), Cipher Block Chaining(CBC), Cipher FeedBack(CFB), Output FeedBack(OFB), and CounTeR(CTR) working modes. Subsequently, the ciphertext files undergo data preprocessing to form a ciphertext dataset which is then input into the Transformer model for five-category recognition. After analyzing and comparing this experiment with similar work in the existing literature, the recognition accuracy of the SM4 algorithm reaches 94.94% for the ciphertext-mixed scenario of five working modes, demonstrating that the scheme can effectively improve recognition accuracy. Finally, the ciphertext dataset is input into the three traditional neural network structures, Convolutional Neural Network(CNN), Recurrent Neural Network(RNN), and ResNet, for comparative experiments. The experimental results show that, compared with these three traditional neural network, the recognition accuracy of the self-attention-based Transformer model increased by 18.38, 26.96, and 10.44 percentage points, respectively.

Key words: cipher algorithm recognition, SM4 algorithm, working mode, deep learning, Transformer model

池亚平, 岳梓岩, 林雨衡. 基于Transformer的SM4算法工作模式识别[J]. 计算机工程, 2023, 49(9): 109-117.

Yaping CHI, Ziyan YUE, Yuheng LIN. Working Mode Recognition for SM4 Algorithm Based on Transformer[J]. Computer Engineering, 2023, 49(9): 109-117.

http://www.ecice06.com/CN/Y2023/V49/I9/109

图/表 18

图1 特征提取流程图

Fig.1 Flowchart of feature extraction

图2 码元频数统计检测值分布

Fig.2 The distribution of symbol frequency statistics detection values

图3 游程频数统计检测值分布

Fig.3 The distribution of run-length frequency statistical detection values

图4 块内频数统计检测值分布

Fig.4 The distribution of intra-block frequency statistical detection values

图5 数据集A获取流程图

Fig.5 Flowchart of dataset A acquisition

图6 数据集B获取流程图

Fig.6 Flowchart of dataset B acquisition

图7 基于Transformer分类方案流程图

Fig.7 Flowchart of classification scheme based on Transformer

图8 不同神经网络模型识别准确率

Fig.8 Recognition accuracy of different neural network models

参考文献 25

1	丁伟, 谈程. 一种基于密文分析的密码识别技术. 通信技术, 2016, 49(10): 1382- 1386. doi: 10.3969/j.issn.1002-0802.2016.10.022
	DING W, TAN C. An approach of identifying cipher based on ciphertext analysis. Communications Technology, 2016, 49(10): 1382- 1386. doi: 10.3969/j.issn.1002-0802.2016.10.022
2	李继中, 舒辉. 密码算法识别技术研究. 信息网络安全, 2011,(11): 46- 49. URL
	LI J Z, SHU H. The research of crypto algorithm recognition technology. Netinfo Security, 2011,(11): 46- 49. URL
3	Information Technology Laboratory. A statistical test suite for random and pseudorandom number generators for cryptographic applications: SP800-22 Rev. 1a—2010[S]. Washington D. C., USA: National Institute of Standards and Technology, 2010.
4	陈华, 冯登国, 范丽敏. 一种关于分组密码的新的统计检测方法. 计算机学报, 2009, 32(4): 595- 601. URL
	CHEN H, FENG D G, FAN L M. A new statistical test on block ciphers. Chinese Journal of Computers, 2009, 32(4): 595- 601. URL
5	黄良韬, 赵志诚, 赵亚群. 基于随机森林的密码体制分层识别方案. 计算机学报, 2018, 41(2): 382- 399. URL
	HUANG L T, ZHAO Z C, ZHAO Y Q. A two-stage cryptosystem recognition scheme based on random forest. Chinese Journal of Computers, 2018, 41(2): 382- 399. URL
6	王旭, 陈永乐, 王庆生, 等. 结合特征选择与集成学习的密码体制识别方案. 计算机工程, 2021, 47(1): 139-145, 153. URL
	WANG X, CHEN Y L, WANG Q S, et al. Cryptosystem identification scheme combining feature selection and ensemble learning. Computer Engineering, 2021, 47(1): 139-145, 153. URL
7	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-08-10]. https://arxiv.org/abs/1409.1556.
8	DE SOUZA W A R, TOMLINSON A. A distinguishing attack with a neural network[C]//Proceedings of the 13th International Conference on Data Mining. Washington D. C., USA: IEEE Press, 2014: 154-161.
9	YUAN C X. Cryptosystem recognition scheme based on convolution features[C]//Proceedings of International Conference on Artificial Intelligence, Big Data and Algorithms. Washington D. C., USA: IEEE Press, 2021: 229-232.
10	曹莉茹. 基于深度学习的密码算法识别研究[D]. 成都: 电子科技大学, 2021.
	CAO L R. Research on cryptographic algorithm recognition based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2021. (in Chinese)
11	STALLINGS W. 密码编码学与网络安全——原理与实践[M]. 孟庆树, 译. 北京: 电子工业出版社, 2006.
	STALLINGS W. Cryptography and network security: principles and practices[M]. MENG Q S, translated. Beijing: Publishing House of Electronics Industry, 2006. (in Chinese)
12	纪文桃, 李媛媛, 秦宝东. 基于决策树的SM4分组密码工作模式识别. 计算机工程, 2021, 47(8): 157-161, 169. URL
	JI W T, LI Y Y, QIN B D. Working mode recognition for SM4 block cipher based on decision tree. Computer Engineering, 2021, 47(8): 157-161, 169. URL
13	HUANG J, WEI Y Q, YI J, et al. An improved kNN based on class contribution and feature weighting[C]//Proceedings of the 10th International Conference on Measuring Technology and Mechatronics Automation. Washington D. C., USA: IEEE Press, 2018: 313-316.
14	郭帅, 程光. TLS密码套件的流量数据随机性分析. 北京航空航天大学学报, 2022, 48(2): 291- 300. URL
	GUO S, CHENG G. Randomness of traffic data in TLS cipher suite. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 291- 300. URL
15	GUO Q, KE Z H, WANG S Y, et al. Persistent fault analysis against SM4 implementations in libraries Crypto++ and GMSSL. IEEE Access, 2021, 9, 63636- 63645.
16	吴杨, 王韬, 李进东. 分组密码算法密文的统计检测新方法研究. 军械工程学院学报, 2015, 27(3): 58- 64. URL
	WU Y, WANG T, LI J D. Research on new statistical and testing method for ciphertexts of block cipher. Journal of Ordnance Engineering College, 2015, 27(3): 58- 64. URL
17	KIM Y. Convolutional neural networks for sentence classification[EB/OL]. [2022-08-10]. https://arxiv.org/pdf/1408.5882.pdf.
18	WANG W, ZHU M, ZENG X W, et al. Malware traffic classification using convolutional neural network for representation learning[C]//Proceedings of International Conference on Information Networking. Washington D. C., USA: IEEE Press, 2017: 712-717.
19	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2022-08-10]. https://arxiv.org/abs/1706.03762.
20	YU H S, YANG Z G, TAN L, et al. Methods and datasets on semantic segmentation: a review. Neurocomputing, 2018, 304, 82- 103.
21	李永清, 李木舟, 付勇, 等. Raindrop: 面向硬件设计的分组密码算法. 密码学报, 2019, 6(6): 803- 814. URL
	LI Y Q, LI M Z, FU Y, et al. Raindrop: a block cipher designed for hardware. Journal of Cryptologic Research, 2019, 6(6): 803- 814. URL
22	陈师尧, 樊燕红, 付勇, 等. ANT系列分组密码算法. 密码学报, 2019, 6(6): 748- 759. URL
	CHEN S Y, FAN Y H, FU Y, et al. On the design of ANT family block ciphers. Journal of Cryptologic Research, 2019, 6(6): 748- 759. URL
23	夏锐琪, 任炯炯, 陈少真. 基于特征分布函数的分组密码算法差异性分析. 信息工程大学学报, 2022, 23(3): 359-365, 372. URL
	XIA R Q, REN J J, CHEN S Z. Difference analysis of block cipher algorithm based on feature distribution function. Journal of Information Engineering University, 2022, 23(3): 359-365, 372. URL
24	夏锐琪, 李曼曼, 陈少真. 基于机器学习的分组密码结构识别. 网络与信息安全学报, 2023, 9(3): 79- 89. URL
	XIA R Q, LI M M, CHEN S Z. Identification on the structures of block ciphers using machine learning. Chinese Journal of Network and Information Security, 2023, 9(3): 79- 89. URL
25	曾勇, 吴正远, 董丽华, 等. 加密流量中的恶意流量识别技术. 西安电子科技大学学报, 2021, 48(3): 170- 187. URL
	ZENG Y, WU Z Y, DONG L H, et al. Research on malicious traffic identification technology in encrypted traffic. Journal of Xidian University, 2021, 48(3): 170- 187. URL

[1]	林中霖, 时金桥, 王美琪, 王学宾, 王雨燕. 基于应用行为划分的Android恶意应用检测技术[J]. 计算机工程, 2023, 49(9): 125-136.
[2]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[3]	黄保华, 郑慧颖, 屈锡, 陈宁江. 联盟链高效存储访问控制方案[J]. 计算机工程, 2023, 49(8): 37-45.
[4]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[5]	王可铮, 徐玉芬, 周尚波. 结合对比感知损失和融合注意力的图像去雾模型[J]. 计算机工程, 2023, 49(8): 207-214.
[6]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[7]	闫兴亚, 匡娅茜, 白光睿, 李月. 基于深度学习的学生课堂行为识别方法[J]. 计算机工程, 2023, 49(7): 251-258.
[8]	李军侠, 王星驰, 殷梓, 石德硕. 边缘深度挖掘的弱监督显著性目标检测[J]. 计算机工程, 2023, 49(7): 169-178.
[9]	吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.
[10]	席建锐, 唐红梅, 梁春阳, 刘鑫. 基于改进隐函数的点云物体重建[J]. 计算机工程, 2023, 49(7): 214-222.
[11]	齐咏生, 杜晓旭, 朱俊峰, 高胜利, 刘利强. 基于增强型轻量深度网络的牧区牲畜高效检测[J]. 计算机工程, 2023, 49(7): 278-287.
[12]	谌雨章, 黄逸姿, 张钧涵. 基于多速率空洞卷积的多尺度水下小目标检测[J]. 计算机工程, 2023, 49(6): 257-264.
[13]	张博旭, 蒲智, 程曦. 基于提示学习的维吾尔语文本分类研究[J]. 计算机工程, 2023, 49(6): 292-299,313.
[14]	于海洋, 景鹏, 张文涛, 谢赛飞, 滑志华, 宋草原. 基于残差与注意力机制的道路裂缝检测U-Net改进模型[J]. 计算机工程, 2023, 49(6): 265-273.
[15]	王爱玲, 马文臻, 邹自明, 钟佳. 基于领域自适应的卫星工程参数异常检测[J]. 计算机工程, 2023, 49(5): 29-37,47.

选择文件类型/文献管理软件名称

选择包含的内容