基于多特征融合的智能合约缺陷检测方法

doi:10.19678/j.issn.1000-3428.0068522

摘要/Abstract

摘要：

智能合约是区块链技术最成功的应用之一, 随着其广泛应用, 智能合约的安全问题也引起了研究人员的关注。尽管已有一些针对智能合约缺陷检测的研究, 但对于智能合约代码特征的挖掘还不充分。提出一种采用多特征融合方式的智能合约缺陷检测方法。首先, 对智能合约代码进行预处理, 其中包括颜色标记、词汇提取、ASCII字符转换以及合约之间继承关系的提取; 然后, 将颜色标记、词汇提取、ASCII字符转换得到的处理信息输入到由BERT、卷积神经网络(CNN)以及双向长短期记忆(BiLSTM)网络构建的融合模型中进行特征提取, 同时将合约之间的继承关系信息输入node2vec随机游走算法, 以获得合约关系的特征向量; 最后, 将所有特征向量连接并输入分类器进行缺陷分类。使用真实的Solidity智能合约数据集对该方法进行验证, 实验结果表明, 相比其他模型, 所提多特征融合模型在F1值实现了6%~12%的改进, 在准确度方面实现了4%~11%的提升, 该方法能够更好地挖掘智能合约代码的深层特征, 提高缺陷检测性能, 对智能合约的安全性具有一定的应用价值。

关键词: 区块链, 智能合约, Solidity语言, 多特征, 缺陷检测

Abstract:

Smart contracts are one of the most successful applications of the blockchain technology. Owing to their widespread application, the security issues of smart contracts have attracted widespread attention from researchers. Although some studies have been conducted on defect detection in smart contracts, mining of code features in smart contracts remains insufficient. This paper introduces a smart contract defect detection method that employs a multi-feature fusion approach. First, the smart contract code undergoes preprocessing, including color labeling, vocabulary extraction, ASCII character conversion, and extraction of inheritance relationships between contracts. The processing information obtained from the first three steps is then input into a fusion model constructed using Bidirectional Encoder Representations from Transformers (BERT), Convolutional Neural Network (CNN), and Bidirectional Long Short Term Memory (BiLSTM) network for feature extraction. Simultaneously, the information on inheritance relationship between contracts is input into the node2vec random walk algorithm to obtain the feature vector of the contract relationship. Finally, all feature vectors are connected and input into the classifier for defect classification. The multi-feature fusion model is validated using a real Solidity smart contract dataset, and experimental results show that compared with other models, it achieves 6%-12% and 4%-11% improvements in the F1 value and accuracy, respectively. This method can comprehensively explore the inherent characteristics of smart contract code, improve defect detection performance, and find potential applications in preserving the security of smart contracts.

Key words: blockchain, smart contract, Solidity language, multi-feature, defect detection

王奕丰, 曾诚, 全擎宇, 王娇然, 何鹏. 基于多特征融合的智能合约缺陷检测方法[J]. 计算机工程, 2024, 50(8): 133-141.

Yifeng WANG, Cheng ZENG, Qingyu QUAN, Jiaoran WANG, Peng HE. Smart Contract Defect Detection Method Based on Multi-Feature Fusion[J]. Computer Engineering, 2024, 50(8): 133-141.

https://www.ecice06.com/CN/Y2024/V50/I8/133

图/表 8

图1 多特征融合Solidity智能合约缺陷检测模型总体框架

Fig.1 The overall framework of multi-feature fusion Solidity smart contract defect detection model

图2 1个智能合约代码树示意图

Fig.2 Schematic diagram of a smart contract code tree

图3 缺陷检测分类层的体系结构

Fig.3 Architecture structure of defect detection classification layer

图4 检测不同种类缺陷的准确率及F1值比较

Fig.4 Comparison of the accuracy and F1 value of detecting different types of defects

图5 4类模型对于各类缺陷检测的准确率对比

Fig.5 Accuracy comparison of four types of models for various defect detection

图6 4类模型对于各类缺陷检测的F1值比较

Fig.6 F1 values comparison of four types of models for various defect detection

参考文献 34

1	SZABO N. Formalizing and securing relationships on public networks. First Monday, 1997, 2(9): 1- 9.
2	LINNHOFF-POPIEN C, SCHNEIDER R, ZADDACH M. Digital marketplaces unleashed. Berlin, Germany: Springer, 2018.
3	HIRAI Y. Defining the Ethereum virtual machine for interactive theorem provers[M]//Financial Cryptography and Data Security. Berlin, Germany: Springer, 2017: 520-535.
4	DUPONT Q. Experiments in algorithmic governance: a history and ethnography of ″The DAO, ″ a failed decentralized autonomous organization[EB/OL]. [2023-09-05]. https://www.researchgate.net/profile/Quinn-Dupont/publication/319529311_Experiments_in_Algorithmic_Governance_A_history_and_ethnography_of_The_DAO_a_failed_Decentralized_Autonomous_Organization/links/59b15fc2a6fdcc3f888dfac8/Experiments-in-Algorithmic-Governance-A-history-and-ethnography-of-The-DAO-a-failed-Decentralized-Autonomous-Organization.pdf.
5	WANG Y L, CHEN X P, HUANG Y, et al. An empirical study on real bug fixes from solidity smart contract projects[EB/OL]. [2023-09-05]. http://arxiv.org/abs/2210.11990.
6	QIAN P, LIU Z G, HE Q M, et al. Smart contract vulnerability detection technique: a survey[EB/OL]. [2023-09-05]. https://arxiv.org/abs/2209.05872v1.
7	WANG W, SONG J J, XU G Q, et al. ContractWard: automated vulnerability detection models for ethereum smart contracts. IEEE Transactions on Network Science and Engineering, 2021, 8(2): 1133- 1144. doi: 10.1109/TNSE.2020.2968505
8	QIAN P, LIU Z G, HE Q M, et al. Towards automated reentrancy detection for smart contracts based on sequential models. IEEE Access, 2020, 8, 19685- 19695. doi: 10.1109/ACCESS.2020.2969429
9	杨慧文, 崔展齐, 陈翔, 等. 基于软件度量的Solidity智能合约缺陷预测方法. 软件学报, 2022, 33(5): 1587- 1611. URL
	YANG H W, CUI Z Q, CHEN X, et al. Defect prediction for Solidity smart contracts based on software measurement. Journal of Software, 2022, 33(5): 1587- 1611. URL
10	赵波, 上官晨晗, 彭小燕, 等. 基于语义感知图神经网络的智能合约字节码漏洞检测方法. 工程科学与技术, 2022, 54(2): 49- 55. URL
	ZHAO B, SHANGGUAN C H, PENG X Y, et al. Semantic-aware graph neural network for smart contract bytecode vulnerability detection. Advanced Engineering Sciences, 2022, 54(2): 49- 55. URL
11	YANG Y Q, ZHOU D Q, YANG X J. A multi-feature weighting based K-means algorithm for MOOC learner classification. Computers, Materials & Continua, 2019, 59(2): 625- 633.
12	张光华, 刘永升, 王鹤, 等. 基于BiLSTM和注意力机制的智能合约漏洞检测方案. 信息网络安全, 2022, 22(9): 46- 54. URL
	ZHANG G H, LIU Y S, WANG H, et al. Smart contract vulnerability detection scheme based on BiLSTM and attention mechanism. Netinfo Security, 2022, 22(9): 46- 54. URL
13	MOSSBERG M, MANZANO F, HENNENFENT E, et al. Manticore: a user-friendly symbolic execution framework for binaries and smart contracts[C]//Proceedings of the 34th International Conference on Automated Software Engineering (ASE). Washington D. C., USA: IEEE Press, 2019: 1186-1189.
14	LIANG H L, PEI X X, JIA X D, et al. Fuzzing: state of the art. IEEE Transactions on Reliability, 2018, 67(3): 1199- 1218. doi: 10.1109/TR.2018.2834476
15	WANG S, LIU T Y, TAN L. Automatically learning semantic features for defect prediction[C]//Proceedings of the 38th International Conference on Software Engineering. Washington D. C., USA: IEEE Press, 2016: 1-10.
16	TSANKOV P, DAN A, DRACHSLER-COHEN D, et al. Securify: practical security analysis of smart contracts[C]//Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. New York, USA: ACM Press, 2018: 1-10.
17	JIANG B, LIU Y, CHAN W K. ContractFuzzer: fuzzing smart contracts for vulnerability detection[C]//Proceedings of the 33rd International Conference on Automated Software Engineering. New York, USA: ACM Press, 2018: 259-269.
18	ZHUANG Y, LIU Z G, QIAN P, et al. Smart contract vulnerability detection using graph neural networks[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence. New York, USA: ACM Press, 2021: 3283-3290.
19	HUANG T H D, KAO H Y. R2-D2: color-inspired convolutional neural network (CNN)-based Android malware detections[C]//Proceedings of the IEEE International Conference on Big Data (Big Data). Washington D. C., USA: IEEE Press, 2018: 2633-2642.
20	SCALABRINO S, LINARES-VÁSQUEZ M, POSHYVANYK D, et al. Improving code readability models with textual features[C]//Proceedings of the 24th International Conference on Program Comprehension. Washington D. C., USA: IEEE Press, 2016: 1-10.
21	WANG J, DONG Y. Improve visual question answering based on text feature extraction. Journal of Physics: Conference Series, 2021, 1856(1): 012025. doi: 10.1088/1742-6596/1856/1/012025
22	江邹, 蒋慕蓉, 赵春娜, 等. 利用文本特征增强与注意力机制提高图像问答准确率. 计算机科学与应用, 2019, 9(12): 2403- 2410.
	JIANG Z, JIANG M R, ZHAO C N, et al. Improve image question and answer accuracy by using text feature enhancement and attention mechanism. Computer Science and Application, 2019, 9(12): 2403- 2410.
23	FANG C R, LIU Z X, SHI Y Y, et al. Functional code clone detection with syntax and semantics fusion learning[C]//Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York, USA: ACM Press, 2020: 516-527.
24	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-09-05]. https://arxiv.org/pdf/1810.04805.
25	MI Q, KEUNG J, XIAO Y, et al. An inception architecture-based model for improving code readability classification[C]//Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering. New York, USA: ACM Press, 2018: 139-144.
26	HUANG Y W, ZHANG T, FANG S, et al. Deep smart contract intent detection[EB/OL]. [2023-09-05]. http://arxiv.org/abs/2211.10724.
27	CHAUHAN R, GHANSHALA K K, JOSHI R C. Convolutional neural network (CNN) for image detection and recognition[C]//Proceedings of the 1st International Conference on Secure Cyber Computing and Communication. Washington D. C., USA: IEEE Press, 2018: 278-282.
28	LI Z W, LIU F, YANG W J, et al. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999- 7019. doi: 10.1109/TNNLS.2021.3084827
29	ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[EB/OL]. [2023-09-05]. https://aclanthology.org/P16-2034.pdf.
30	LI Y, DONG H B. Text sentiment analysis based on CNN and BiLSTM network feature fusion. Computer Application, 2018, 38(11): 3075- 3080.
31	GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks[EB/OL]. [2023-09-05]. https://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC5108654&blobtype=pdf.
32	BREIMAN L. Random forests. Machine learning, 2001, 45, 5- 32. doi: 10.1023/A:1010933404324
33	PETERSON L. K-nearest neighbor. Scholarpedia, 2009, 4(2): 1883. doi: 10.4249/scholarpedia.1883
34	MURPHY K P. Naive bayes classifiers. University of British Columbia, 2006, 18(60): 1- 8.

[1]	郑清安, 董建成, 陈亮, 阮英清, 李锦松, 许林彬. 分布式可信数据管理与隐私保护技术研究[J]. 计算机工程, 2024, 50(7): 174-186.
[2]	张诗婧, 莫绪涛, 赵行, 董杨林. 基于球面折反射成像和YOLOv7的内螺纹缺陷检测[J]. 计算机工程, 2024, 50(7): 282-292.
[3]	刘寅昊, 蒋文保, 孙林昆, 王勇攀. 基于路径存储表的Hashgraph共识算法优化与实现[J]. 计算机工程, 2024, 50(6): 166-178.
[4]	梁松林, 林伟, 王珏, 杨庆. 面向后渗透攻击行为的网络恶意流量检测研究[J]. 计算机工程, 2024, 50(5): 128-138.
[5]	旋逸昭, 赵红武, 金瑜. 一种基于双链的区块链共识机制[J]. 计算机工程, 2024, 50(5): 139-148.
[6]	王栋, 王合建, 玄佳兴, 郑尚卓, 陈炳聪. 面向电力调度指令的区块链隐私可追踪存证方案[J]. 计算机工程, 2024, 50(5): 158-166.
[7]	陈纪成, 包子健, 罗敏, 何德彪. 一种面向工业物联网的远程安全指令控制方案[J]. 计算机工程, 2024, 50(3): 28-35.
[8]	李宝莹, 李志淮, 王成爱, 杨锋. 自适应节点规模的区块链分片可扩展模型[J]. 计算机工程, 2024, 50(3): 137-147.
[9]	刘少杰, 文斌, 王泽旭. 基于联邦学习的多技术融合数据交易方法[J]. 计算机工程, 2024, 50(3): 182-190.
[10]	谢帅康, 熊风光, 朱新杰, 宋宁栋, 李文清, 王廷凤. 基于空间可变形Transformer的三维点云配准方法[J]. 计算机工程, 2024, 50(3): 224-232.
[11]	高山, 王诚昱, 毕成铭, 朱铁英. 基于符号执行的智能合约重入漏洞检测[J]. 计算机工程, 2024, 50(10): 196-204.
[12]	马超, 宋琛. 计及电力数据安全的智能合约上链方法及防篡改技术研究[J]. 计算机工程, 2024, 50(10): 240-254.
[13]	倪雪莉, 马卓, 王群. 区块链矿池网络及典型攻击方式综述[J]. 计算机工程, 2024, 50(1): 17-29.
[14]	蔡梓越, 谭北海, 余荣, 黄旭民, 王思明. 面向6G物联网设备协同的区块链动态分片[J]. 计算机工程, 2024, 50(1): 50-59.
[15]	崔怀勇, 张绍华, 李超, 戴炳荣. 一种基于Schnorr签名的区块链预言机改进方案[J]. 计算机工程, 2024, 50(1): 166-173.

选择文件类型/文献管理软件名称

选择包含的内容