Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

A malicious code classification method based on the fusion of LightGBM and multidimensional features

  

  • Published:2025-07-03

基于多维特征融合的恶意代码分类方法

Abstract: n the field of computer security, malicious code protection has always been an important research topic. With the rapid development of computer technology, the types and forms of malicious code are constantly evolving. Traditional feature engineering methods have a single feature dimension when dealing with complex malicious samples, resulting in insufficient representation ability and the inability to accurately identify various types of malicious code. Other malicious code classification methods based on feature fusion rely on expert experience to manually design features during the feature extraction process. Moreover, multimodal deep learning models have insufficient interpretability and high computational costs.To address these issues, this paper proposes an innovative feature fusion method, which is applied to the classification of malicious code in Windows PE files. By integrating behavioral features, structural features, and texture features, and using LightGBM as the classifier, the classification of malicious code is completed. The experimental results show that the proposedmethod achieves a test accuracy of 99.90% and a log loss (Logloss) of 0.0057 on the Microsoft Malware Classification Challenge dataset, and a test accuracy of 98.97% and a log loss of 0.042 on the Bazaar dataset.The experimental results demonstrate that this method can comprehensively and accurately represent malicious code, and it has important theoretical significance and practical application value. By fusing multi-dimensional features, this method provides an effective solution for malicious code detection and has broad application prospects.

摘要: 在计算机安全领域,恶意代码防护一直是计算机安全领域的重要研究课题。随着计算机技术的快速发展,恶意代码的种类和形式不断演变,传统特征工程方法在处理复杂恶意样本时特征维度单一,致使表征能力不足,无法精准识别各类恶意代码。其他基于特征融合的恶意代码分类方法特征提取过程依赖专家经验手工进行特征设计,而多模态深度学习模型可解释性不足,计算开销大。为此,本文提出了一种创新的特征融合方法,该方法应用于Windows PE文件的恶意代码分类,通过整合行为特征、结构特征及纹理特征,并采用LightGBM作为分类器完成对恶意代码的分类。实验结果表明,该方法在Microsoft恶意软件分类挑战赛数据集上的测试准确率为99.90%,对数损失(Logloss)为0.0057,在Bazaar数据集上的测试准确率为98.97%,对数损失为0.042。实验结果显示这一方法能够全面、准确地表征恶意代码,具有重要的理论意义和实际应用价值。通过融合多维特征,该方法为恶意代码检测提供了一种有效的解决方案,具有广阔的应用前景。