基于大模型与强化学习的威胁狩猎方法

doi:10.19678/j.issn.1000-3428.0070706

摘要/Abstract

摘要： 网络威胁狩猎通过主动发现攻击线索与恶意证据实现对攻击事件的快速响应。现有网络威胁狩猎方法虽具备在广泛信息源条件下进行决策的能力，但在现实场景中存在先验知识缺乏以及反馈稀疏的问题。针对上述问题，提出一种基于大语言模型和强化学习的威胁狩猎算法RE-HUNTER。为解决先验知识缺乏的问题，该方法构建了上下文向量数据库，利用大语言模型中的领域知识和网络威胁情报中的非结构化知识提升决策方法的冷启动效果，初始化强化学习权重；为解决反馈稀疏的问题，该方法改进了蒙特卡洛树搜索算法，引入了递归更新机制和方法相似度机制，从而增强了对实体和方法执行结果的反馈。基于186个真实攻击案例进行的威胁狩猎实验结果显示，该模型相较当前最优基线方法显著提高了搜索效率，在0-2000步区间内平均召回率相对提升18.24%；值得注意的是，在0-250步冷启动区间内较基线方法实现了86.28%的平均召回率相对提升；此外，消融实验表明该方法的不同组成部分对实验结果均起到正向作用，能够有效降低网络威胁狩猎的成本。

Abstract: Cyber Threat Hunting enables rapid response to attack events through proactive discovery of attack clues and malicious evidence. While existing Cyber Threat Hunting methods can search across extensive information sources, they face challenges in real-world scenarios due to the problems of insufficient prior knowledge and sparse feedback. To address these problems, this paper proposes a Cyber Threat Hunting algorithm, RE-HUNTER, based on Large Language Models and Reinforcement Learning. To address the lack of prior knowledge, this method constructs a contextual vector database and leverages domain expertise from Large Language Models as well as unstructured knowledge from Cyber Threat Intelligence for cold-start decision-making to initialize Reinforcement Learning weights. To address the problem of sparse feedback, this method improves the Monte Carlo Tree Search algorithm by introducing a recursive update mechanism and a method similarity mechanism to amplify the feedback on the execution results of both entities and methods. Experiments conducted on 186 real-world attack cases demonstrate that this model significantly improves search efficiency compared to the current state-of-the-art baseline methods. Within the 0–2000-step range, the average recall rate achieves an 18.24% relative improvement. Notably, in the 0–250-step cold-start scenario, the average recall rate attains an 86.28% relative improvement over the best baseline method. Furthermore, ablation experiments indicate that each component of the proposed method positively contributes to the overall performance, effectively reducing the cost of Cyber Threat Hunting.

崔泽源, 葛文翰, 王俊峰. 基于大模型与强化学习的威胁狩猎方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0070706.

Zeyuan Cui, Wenhan Ge, Junfeng Wang. A Cyber Threat Hunting Method using Reinforcement Learning Agent with Large Language Models[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0070706.

参考文献

[1] IBM. What is threat hunting?[EB/OL]. [2025-02-10]. https://www.ibm.com/think/topics/threat-hunting.
[2] GAO Y, LI X, PENG H, et al. Hincti: A cyberthreat intelligence modeling and identification system based on heterogeneous information network[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(2): 708-722.
[3] SATVAT K, GJOMEMO R, VENKATAKRISHNAN V N. Extractor: Extracting attack behavior from threat reports[C]//2021 IEEE European Symposium on Security and Privacy (EuroS&P). New York, USA: IEEE Press, 2021: 598-615.
[4] REN Y, XIAO Y, ZHOU Y, et al. CSKG4APT: A cybersecurity knowledge graph for advanced persistent threat organization attribution[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 35(6): 5695-5709.
[5] FARD N E, SELMIC R R, KHORASANI K. A review of techniques and policies on cybersecurity using artificial intelligence and reinforcement learning algorithms[J]. IEEE Technology and Society Magazine, 2023, 42(3): 57-68.
[6] SUFI F. An innovative gpt-based open-source intelligence using historical cyber incident reports[J]. Natural Language Processing Journal, 2024, 7: 100074.
[7] YAO Y, DUAN J, XU K, et al. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly[J]. High-Confidence Computing, 2024: 100211.
[8] DEKEL L, LEYBOVICH I, ZILBERMAN P, et al. MABAT: A multi-armed bandit approach for threat-hunting[J]. IEEE Transactions on Information Forensics and Security, 2022, 18: 477-490.
[9] MILAJERDI S M, ESHETE B, GJOMEMO R, et al. Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM Press, 2019: 1795-1812.
[10] NADEEM A, VERWER S, MOSKAL S, et al. Alert-driven attack graph generation using s-pdfa[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 19(2): 731-746.
[11] KOCSIS L, SZEPESVÁRI C. Bandit based monte-carlo planning[C]//European Conference on Machine Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006: 292-293.
[12] ABU M S, SELAMAT S R, ARIFFIN A, et al. Cyber threat intelligence-issue and challenges[J]. Indonesian Journal of Electrical Engineering and Computer Science, 2018, 10(1): 371-379.
[13] WAGNER C, DULAUNOY A, WAGENER G, et al. MISP: The design and implementation of a collaborative threat intelligence sharing platform[C]//Proceedings of the 2016 ACM Workshop on Information Sharing and Collaborative Security. New York: ACM Press, 2016: 49-56.
[14] WAGNER T D, MAHBUB K, PALOMAR E, et al. Cyber threat intelligence sharing: Survey and research directions[J]. Computers & Security, 2019, 87: 101589.
[15] 杨秀璋, 彭国军, 刘思德, 等. 面向 APT 攻击的溯源和推理研究综述 [J/OL]. 软件学报 , 2025(1): 1–50[2024-12-14]. https://doi.org/10.13328/j.cnki.jos.007162. YANG X Z, PENG G J, LIU S D, et al. Survey on attribution and inference research for APT attacks[J/OL]. Journal of Software, 2025(1): 1-50[2024-12-14]. https://doi.org/10.13328/j.cnki.jos.007162.
[16] MILAJERDI S M, GJOMEMO R, ESHETE B, et al. Holmes: Real-time apt detection through correlation of suspicious information flows[C]//2019 IEEE Symposium on Security and Privacy (SP). Washington D.C., USA: IEEE Press, 2019: 1137-1152.
[17] GAO P, SHAO F, LIU X, et al. Enabling efficient cyber threat hunting with cyber threat intelligence[C]//2021 IEEE 37th International Conference on Data Engineering (ICDE). Washington D. C., USA: IEEE Press, 2021: 193-204.
[18] ALSAHEEL A, NAN Y, MA S, et al. ATLAS: A sequence-based learning approach for attack investigation[C]//30th USENIX Security Symposium (USENIX Security 21). [s.l.]: USENIX Association, 2021: 3005-3022.
[19] SUN N, DING M, JIANG J, et al. Cyber threat intelligence mining for proactive cybersecurity defense: a survey and new perspectives[J]. IEEE Communications Surveys & Tutorials, 2023.
[20] 崔琳, 杨黎斌, 何清林, 等. 基于开源信息平台的威胁情报挖掘综述[J]. 信息安全学报, 2022, 7(01): 1-26. DOI:10.19363/J.cnki.cn10-1380/tn.2022.01.01. CUI L, YANG L, HE Q, WANG M, et al. Survey of cyber threat intelligence mining based on open source information platform[J]. Journal of Cyber Security, 2022, 7(1).
[21] 李沁东, 陈兴蜀, 唐文佚. 开源威胁情报生产与应用综述[J]. 网络空间安全科学学报,2023,1(01):59-80. Li Q D, Chen X S, Tang W Y. A survey of open-source threat intelligence production and application[J]. Journal of Cybersecurity, 2023, 1(1): 59-80.
[22] SHAKYA A K, PILLAI G, CHAKRABARTY S. Reinforcement learning algorithms: A brief survey[J]. Expert Systems with Applications, 2023: 120495.
[23] ZHONG L, WU J, LI Q, et al. A comprehensive survey on automatic knowledge graph construction[J]. ACM Computing Surveys, 2023, 56(4): 1-62.
[24] SHEN Y, CHEN J, HUANG P S, et al. M-walk: Learning to walk over graphs using monte carlo tree search[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Red Hook, USA: Curran Associates Inc., 2018: 6787-6798.
[25] XIONG W, HOANG T, WANG W Y. DeepPath: A reinforcement learning method for knowledge graph reasoning[EB/OL]. (2018-07-07)[2025-02-10]. https://arxiv.org/abs/1707.06690.
[26] Vodopivec T, Samothrakis S, Ster B. On monte carlo tree search and reinforcement learning[J]. Journal of Artificial Intelligence Research, 2017, 60: 881-936.
[27] CHANG Y, WANG X, WANG J D, et al. A survey on evaluation of large language models[J]. ACM Transactions on Intelligent Systems and Technology, 2023.
[28] ZHAO Z R, LEE W S, HSU D. Large language models as commonsense knowledge for large-scale task planning[J]. Advances in Neural Information Processing Systems, 2024, 36.
[29] HU B, ZHAO C Y, ZHANG P, et al. Enabling intelligent interactions between an agent and an llm: A reinforcement learning approach[EB/OL]. (2024-06-21)[2025-02-10]. https://arxiv.org/abs/2306.03604.
[30] SANNER S, BALOG K, RADLINSKI F, et al. Large language models are competitive near cold-start recommenders for language-and item-based preferences[C]//Proceedings of the 17th ACM Conference on Recommender Systems. New York, USA: ACM Press, 2023: 890-896.
[31] HOU Y P, ZHANG J J, LIN Z H, et al. Large language models are zero-shot rankers for recommender systems[C]//European Conference on Information Retrieval. Cham: Springer, 2024: 364-381.
[32] HE Z K, XIE Z H, JHA R, et al. Large language models as zero-shot conversational recommenders[C]//Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. New York, USA: ACM Press, 2023: 720-730.
[33] GAO Y F, XIONG Y, GAO X Y, et al. Retrieval-augmented generation for large language models: A survey[EB/OL]. (2023-12-18)[2025-02-10]. https://arxiv.org/abs/2312.10997.
[34] MANI S K, ZHOU Y J, HSIEH K, et al. Enhancing network management using code generated by large language models[C]//Proceedings of the 22nd ACM Workshop on Hot Topics in Networks. New York, USA:ACM Press, [2023]: 196-204.
[35] MENG X Y, LIN C G, WANG Y Q, et al. NetGPT: Generative pretrained transformer for network traffic[J/OL]. (2023-04-19)[2025-02-10]. https://arxiv.org/abs/2304.09513.
[36] ALI T, KOSTAKOS P. HuntGPT: Integrating machine learning-based anomaly detection and explainable AI with large language models (LLMs) [EB/OL]. (2023-09-27)[2025-02-10]. https://arxiv.org/abs/2309.16021.
[37] VirusTotal. VirusTotal - Analyze suspicious files, domains, IPs and URLs to detect malware and other breaches[EB/OL]. [2024-02-10]. https://www.virustotal.com/.
[38] ThreatBook. 微步在线 X 情报社区-威胁情报查询_威胁分析平台 _ 开放社区 [EB/OL]. [2024-02-10]. https://x.threatbook.com/.
[39] RapidDNS. RapidDNS - Rapid DNS Information Collection[EB/OL]. [2024-02-10]. https://rapiddns.io/.
[40] Hugging Face. Llama-3.3-70B-Instruct[EB/OL]. [2024-02-10]. https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct /.
[41] Hugging Face. Qwen2.5-72B-Instruct[EB/OL]. [2024-02-10]. https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/.
[42] Hugging Face. DeepSeek-R1-Distill-Qwen-32B[EB/OL]. [2024-02-10]. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Q wen-32B/.
[43] Open AI. Model – OpenAI API[EB/OL]. [2024-02-10]. https://platform.openai.com/docs/models/gpt-3.5-turbo/

选择文件类型/文献管理软件名称

选择包含的内容