作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于大模型与强化学习的威胁狩猎方法

  • 发布日期:2025-05-08

A Cyber Threat Hunting Method using Reinforcement Learning Agent with Large Language Models

  • Published:2025-05-08

摘要: 网络威胁狩猎通过主动发现攻击线索与恶意证据实现对攻击事件的快速响应。现有网络威胁狩猎方法虽具备在广泛信息源条件下进行决策的能力,但在现实场景中存在先验知识缺乏以及反馈稀疏的问题。针对上述问题,提出一种基于大语言模型和强化学习的威胁狩猎算法RE-HUNTER。为解决先验知识缺乏的问题,该方法构建了上下文向量数据库,利用大语言模型中的领域知识和网络威胁情报中的非结构化知识提升决策方法的冷启动效果,初始化强化学习权重;为解决反馈稀疏的问题,该方法改进了蒙特卡洛树搜索算法,引入了递归更新机制和方法相似度机制,从而增强了对实体和方法执行结果的反馈。基于186个真实攻击案例进行的威胁狩猎实验结果显示,该模型相较当前最优基线方法显著提高了搜索效率,在0-2000步区间内平均召回率相对提升18.24%;值得注意的是,在0-250步冷启动区间内较基线方法实现了86.28%的平均召回率相对提升;此外,消融实验表明该方法的不同组成部分对实验结果均起到正向作用,能够有效降低网络威胁狩猎的成本。

Abstract: Cyber Threat Hunting enables rapid response to attack events through proactive discovery of attack clues and malicious evidence. While existing Cyber Threat Hunting methods can search across extensive information sources, they face challenges in real-world scenarios due to the problems of insufficient prior knowledge and sparse feedback. To address these problems, this paper proposes a Cyber Threat Hunting algorithm, RE-HUNTER, based on Large Language Models and Reinforcement Learning. To address the lack of prior knowledge, this method constructs a contextual vector database and leverages domain expertise from Large Language Models as well as unstructured knowledge from Cyber Threat Intelligence for cold-start decision-making to initialize Reinforcement Learning weights. To address the problem of sparse feedback, this method improves the Monte Carlo Tree Search algorithm by introducing a recursive update mechanism and a method similarity mechanism to amplify the feedback on the execution results of both entities and methods. Experiments conducted on 186 real-world attack cases demonstrate that this model significantly improves search efficiency compared to the current state-of-the-art baseline methods. Within the 0–2000-step range, the average recall rate achieves an 18.24% relative improvement. Notably, in the 0–250-step cold-start scenario, the average recall rate attains an 86.28% relative improvement over the best baseline method. Furthermore, ablation experiments indicate that each component of the proposed method positively contributes to the overall performance, effectively reducing the cost of Cyber Threat Hunting.