作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (5): 270-280. doi: 10.19678/j.issn.1000-3428.0070472

• 网络空间安全 • 上一篇    下一篇

基于上下文感知语言模型的C2流量检测

吴沛颖1, 李晓慧2, 王俊峰1,*()   

  1. 1. 四川大学计算机学院, 四川 成都 610065
    2. 四川大学网络空间安全学院, 四川 成都 610065
  • 收稿日期:2024-10-12 修回日期:2024-11-22 出版日期:2026-05-15 发布日期:2026-05-12
  • 通讯作者: 王俊峰
  • 作者简介:

    吴沛颖, 女, 硕士研究生, 主研方向为网络与信息安全

    李晓慧, 副研究员、博士

    王俊峰(通信作者), 研究员、博士

  • 基金资助:
    国家自然科学基金(U2133208); 四川省重点研发项目(2023YFG0290); 四川大学-泸州市人民政府战略合作项目(2022CDLZ-5)

C2 Traffic Detection Based on Context-aware Language Model

WU Peiying1, LI Xiaohui2, WANG Junfeng1,*()   

  1. 1. College of Computer Science, Sichuan University, Chengdu 610065, Sichuan, China
    2. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, Sichuan, China
  • Received:2024-10-12 Revised:2024-11-22 Online:2026-05-15 Published:2026-05-12
  • Contact: WANG Junfeng

摘要:

命令与控制(C2)通信在现代高级持续性威胁(APT)中扮演着核心角色, 是APT实现长期潜伏和持续控制的关键通信纽带。C2流量检测对于防御APT攻击、保护网络安全至关重要。然而, 现有的C2流量检测方法主要基于传统机器学习与深度学习, 其中特征工程依赖专家经验, 主观性强且极易产生遗漏, 对快速演变的攻击形态和流量模式适应性较差; 而传统深度学习模型对深层复杂特征捕捉能力较差, 同时对标注数据和训练资源具有较强依赖。为解决以上问题, 提出一种基于Transformer双向编码表示(BERT)的C2流量检测方法C2BT, 不同于传统基于特征工程的检测方法, 利用BERT大语言模型自动学习并捕获网络远程控制流量上下文深层特征, 进一步引入单独训练的Transformer解码器进行重构和误差计算, 以评估编码器的表现质量, 并将重构误差融入编码器后续优化训练过程, 进一步提升模型的检测效果和鲁棒性。通过在多个不同C2流量数据集上的广泛实验, 所提方法展现出卓越的性能和强大的泛化能力, 准确率、精确率、F1值分别达98.47%、95.82%和95.91%, 并在全新的数据集上保持稳定的效果, 证明了该方法在C2流量检测中的有效性。此外, 通过引入解码器重构误差评估机制, 验证编码器的鲁棒性, 进一步提升了检测结果的有效性, 为构建更高效的网络安全检测防御体系提供了新的技术路径。

关键词: 命令与控制, 流量检测, 双向编码表示, 多头注意力机制, 重构误差

Abstract:

Command and Control (C2) communication plays an essential role in modern Advanced Persistent Threats (APTs) and is the key communication link for achieving long-term lurking and continuous control. C2 traffic detection is crucial for defending against APT attacks and protecting network security. However, existing C2 traffic detection methods are mainly based on conventional machine learning and deep learning. In these methods, feature engineering relies on expert experience, is highly subjective and prone to omissions, and has poor adaptability to rapidly evolving attack forms and traffic patterns. Conversely, traditional deep learning models show poor performance in capturing deep and complex features and show a strong dependence on labeled data and training resources. To address these issues, this paper proposes a C2 traffic detection method (C2BT) based on Transformer bidirectional encoding representation. Unlike conventional detection methods based on feature engineering, this method uses the Bidirectional Encoder Representations from Transformers (BERT) large model to automatically learn and capture the depth features of the remote control traffic context. It further introduces a separately trained Transformer decoder for reconstruction and error calculation to evaluate the performance quality of the encoder and incorporates the reconstruction error into the subsequent optimization training process of the encoder to further improve the detection effect and robustness of the model. Extensive experiments are conducted on multiple different C2 traffic datasets. The proposed method demonstrates excellent performance and strong generalization capabilities, with its accuracy, precision, and F1 value reaching 98.47%, 95.82%, and 95.91%, respectively. It maintains stable results on new datasets, demonstrating its effectiveness in C2 traffic detection. The introduction of a decoder reconstruction error evaluation mechanism to verify the robustness of the encoder improves detection efficiency. The proposed method provides a new technical pathway for building a more efficient network security detection and defense system.

Key words: Command and Control (C2), traffic detection, bidirectional encoding representation, multi-head attention mechanism, reconstruction error