基于大语言模型的多智能体系统异常综述(特邀)

doi:10.19678/j.issn.1000-3428.0252754

计算机工程 ›› 2026, Vol. 52 ›› Issue (1): 22-32. doi: 10.19678/j.issn.1000-3428.0252754

• 大模型时代的服务计算 • 上一篇下一篇

基于大语言模型的多智能体系统异常综述(特邀)

张珑耀¹, 温东新¹, 马庄宇¹, 舒燕君¹^,*(), 李庆¹^,², 刘明义¹, 左德承¹

1. 哈尔滨工业大学计算学部, 黑龙江哈尔滨 150001
2. 江苏自动化研究所, 江苏连云港 222006

收稿日期:2025-07-14 修回日期:2025-10-13 出版日期:2026-01-15 发布日期:2025-11-26
通讯作者: 舒燕君
作者简介:
张珑耀(CCF学生会员), 男, 硕士, 主研方向为边缘计算容错技术、多智能体系统异常分析
温东新, 博士
马庄宇, 博士
舒燕君(通信作者), 博士
李庆, 博士
刘明义, 博士
左德承, 博士
基金资助:
国家重点研发计划(2024YFB4506000); 国家自然科学基金(61202091); 国家自然科学基金(62171155)

A Review of Anomaly in Large Language Model-Based Multi-Agent Systems (Invited)

ZHANG Longyao¹, WEN Dongxin¹, MA Zhuangyu¹, SHU Yanjun¹^,*(), LI Qing¹^,², LIU Mingyi¹, ZUO Decheng¹

1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
2. Jiangsu Automation Research Institute, Lianyungang 222006, Jiangsu, China

Received:2025-07-14 Revised:2025-10-13 Online:2026-01-15 Published:2025-11-26
Contact: SHU Yanjun

摘要/Abstract

摘要：

基于大语言模型(LLM)的多智能体系统(MAS)虽在处理复杂任务方面展现出巨大潜力, 但其分布式特性与交互不确定性易引发多样化异常, 威胁系统可靠性。为系统化识别并分类此种异常, 进行全面综述。研究选取7个代表性MAS及相应数据集, 收集13 418段运行轨迹, 采用LLM初步分析与专家人工校验相结合的方法进行数据分析。构建一个涵盖模型理解感知异常、智能体交互异常、任务执行异常和外部环境异常4个层级的细粒度异常分类框架, 并结合典型案例揭示各类异常产生的内在逻辑与外部诱因。统计分析显示: 模型理解感知异常占比最高, 其中"上下文幻觉"和"任务指令误解"是主要问题; 智能体交互异常占16.8%, "信息隐瞒"是主因; 任务执行异常占27.1%, 主要表现为"决策重复出错"; 外部环境异常占18.3%, 以"记忆冲突"为主。此外, 模型理解感知异常作为根源性诱因, 引发其他层级的异常, 凸显了提升模型基础能力的重要性。此分类和根源分析旨在为构建高可靠的基于LLM的MAS提供理论支撑与实践参考。

关键词: 大语言模型, 智能体, 多智能体系统, 异常统计, 异常分类

Abstract:

Large Language Model (LLM)-based Multi-Agent System (MAS) has demonstrated significant potential in handling complex tasks. Their distributed nature and interaction uncertainty can lead to diverse anomalies that threaten system reliability. This paper presents a comprehensive review, identifying and classifying these anomalies systematically. Seven representative multi-agent systems and their corresponding datasets are selected, accounting for 13 418 operational traces, and a hybrid data analysis method is employed, combining preliminary LLM analysis with expert manual validation. A fine-grained, four-level anomaly classification framework is constructed, encompassing the following anomalies: model understanding and perception, agent interaction, task execution, and external environment. Typical cases are analyzed to reveal the underlying logic and external causes of each type of anomaly. Statistical analysis indicates that model understanding and perception anomalies account for the highest proportion, with ″context hallucination″ and ″task instruction misunderstanding″ being the primary issues. Agent interaction anomalies represent 16.8%, primarily caused by ″information concealment″. Task execution anomalies account for 27.1%, mainly characterized by ″repetitive decision errors″. External environment anomalies account for 18.3%, with ″memory conflicts″ as the predominant factor. In addition, the model perception and understanding of anomalies often act as root causes, triggering anomalies at other levels, highlighting the importance of enhancing fundamental model capabilities. These classification and root cause analyses aim to provide theoretical support and practical reference for building highly reliable LLM-based multi-agent systems.

Key words: Large Language Model (LLM), agent, Multi-Agent System (MAS), anomaly statistics, anomaly classification

张珑耀, 温东新, 马庄宇, 舒燕君, 李庆, 刘明义, 左德承. 基于大语言模型的多智能体系统异常综述(特邀)[J]. 计算机工程, 2026, 52(1): 22-32.

ZHANG Longyao, WEN Dongxin, MA Zhuangyu, SHU Yanjun, LI Qing, LIU Mingyi, ZUO Decheng. A Review of Anomaly in Large Language Model-Based Multi-Agent Systems (Invited)[J]. Computer Engineering, 2026, 52(1): 22-32.

https://www.ecice06.com/CN/Y2026/V52/I1/22

图/表 14

图1 基于LLM的MAS基本架构

Fig.1 Basic architecture of LLM-based MAS

图2 基于LLM的MAS异常分析流程

Fig.2 Process of anomaly analysis in LLM-based MAS

图3 Trace数据结构化字段构成

Fig.3 Composition of structured fields in Trace data

图4 基于LLM的Trace分析方法

Fig.4 LLM-based Trace analysis methodology

图5 基于LLM的MAS异常类型

Fig.5 Anomaly types in LLM-based MAS

图6 任务指令误解

Fig.6 Task instruction misunderstanding

图7 信息隐瞒

Fig.7 Information hiding

图8 循环依赖

Fig.8 Circular dependency

图9 决策路径偏离

Fig.9 Deviation from the decision path

图10 完成后不终止

Fig.10 Non-termination after completion

图11 记忆冲突

Fig.11 Memory conflict

参考文献 30

1	叶广大, 高鲁, 曹腾. 基于多智能体系统的合成部队弹药保障模式选择. 指挥控制与仿真, 2025, 47(5): 84- 95.
	YE G D, GAO L, CAO T. Selection of ammunition support modes for combined forces based on multi-agent system. Command Control & Simulation, 2025, 47(5): 84- 95.
2	GRONAUER S, DIEPOLD K. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 2022, 55(2): 895- 943. doi: 10.1007/s10462-021-09996-w
3	黄昌勤, 钟益华, 王希哲, 等. 从单智能体到多智能体: 大模型智能体支持下的激励型学习活动设计与实证研究. 华东师范大学学报(教育科学版), 2025, 43(5): 44- 56.
	HUANG C X, ZHONG Y H, WANG X Z, et al. From single agent to multi-agent: design and empirical study of motivational learning activities supported by large-scale intelligent agents. Journal of East China Normal University (Educational Sciences), 2025, 43(5): 44- 56.
4	HAN S, ZHANG Q, YAO Y, et al. LLM multi-agent systems: challenges and open problems[EB/OL]. [2024-02-05]. https://arxiv.org/abs/2402.03578.
5	WANG S, ZHANG G, YU M, et al. G-safeguard: a topology-guided security lens and treatment on LLM-based multi-agent systems[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2502.11127.
6	CEMRI M, PAN M Z, YANG S, et al. Why do multi-agent LLM systems fail?[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2503.13657.
7	WANG Z, LI J, ZHOU Q, et al. A survey on AgentOps: categorization, challenges, and future directions[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2508.02121.
8	董之南, 张勤学, 胡进, 等. 面向大模型多智能体系统的多维评估方法. 指挥控制与仿真, 2025, 47(2): 121- 131.
	DONG Z N, ZHANG Q X, HU J, et al. A multi-dimensional evaluation method for large language model-powered multi-agent systems. Command Control & Simulation, 2025, 47(2): 121- 131.
9	LI X Y, WANG S, ZENG S Q, et al. A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges. Vicinagearth, 2024, 1(1): 9. doi: 10.1007/s44336-024-00009-2
10	HUANG J T, ZHOU J, JIN T, et al. On the resilience of LLM-based multi-agent collaboration with faulty agents[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2408.00989.
11	BARBI O, YORAN O, GEVA M. Preventing rogue agents improves multi-agent collaboration[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2502.05986.
12	SUNG Y Y, KIM H, ZHANG D. VeriLA: a human-centered evaluation framework for interpretable verification of LLM agent failures[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2503.12651.
13	EPPERSON W, BANSAL G, DIBIA V C, et al. Interactive debugging and steering of multi-agent AI systems[C]//Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. New York, USA: ACM, 2025: 1-15.
14	LI G, HAMMOUD H, ITANI H, et al. CAMEL: communicative agents for "mind" exploration of large language model society. Advances in Neural Information Processing Systems, 2023, 36, 51991- 52008.
15	FOURNEY A, BANSAL G, MOZANNAR H, et al. Magentic-One: a generalist multi-agent system for solving complex tasks[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2411.04468.
16	DONG L, LU Q, ZHU L. AgentOps: enabling observability of LLM agents[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2411.05285.
17	MAST/traces[EB/OL]. [2025-02-16]. https://github.com/multi-agent-systems-failure-taxonomy/MAST/tree/main/traces.
18	AgentOS[EB/OL]. [2025-02-16]. https://ag2.ai/.
19	LI Q, CUI L, ZHAO X, et al. GSM-plus: a comprehensive benchmark for evaluating the robustness of LLMs as mathematical problem solvers[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2402.19255.
20	TRIVEDI H, KHOT T, HARTMANN M, et al. AppWorld: a controllable world of apps and people for benchmarking interactive coding agents[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2407.18901.
21	LI Y, XU J, HAN L, et al. Q-star meets scalable posterior sampling: bridging theory and practice via HyperAgent[C]//Proceedings of the 41st International Conference on Machine Learning. Vienna, Austria: JMLR, 2024: 29022-29062.
22	JIMENEZ C E, YANG J, WETTIG A, et al. SWE-bench: can language models resolve real-world GitHub issues?[C]//Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: ICLR, 2024: 1-14.
23	OpenManus-open-source robotics control framework[EB/OL]. [2025-02-16]. https://open-manus.org.
24	MIALON G, FOURRIER C, WOLF T, et al. GAIA: a benchmark for general AI assistants[C]//Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: ICLR, 2024: 1-15.
25	HONG S, ZHUGE M, CHEN J, et al. MetaGPT: meta programming for a multi-agent collaborative framework[C]// Proceedings of International Conference on Learning Representations (ICLR). New York, USA: ICLR, 2024: 1-10.
26	QIAN C, LIU W, CONG X, et al. Communicative agents for software development[EB/OL]. [2025-02-16]. https://arxiv.org/abs/2307.07924.
27	HENDRYCKS D, BURNS C, BASART S, et al. Measuring massive multitask language understanding[C]//Proceedings of International Conference on Learning Representations. Vienna, Austria: ICLR, 2021: 1-10.
28	Gemini 2.5 pro[EB/OL]. [2025-02-16]. https://deepmind.google/models/gemini/pro/.
29	LUNE H, BERG B L. Qualitative research methods for the social sciences. Boston, USA: Pearson, 2017.
30	KOHEN J. A coefficient of agreement for nominal scale. Educational and Psychological Measurement, 1960, 20, 37- 46. doi: 10.1177/001316446002000104

[1]	林丹, 卢顺峰, 刘姿妍, 张博昭, 何龙, 蒋子规, 吴嘉婧, 郑子彬. 大语言模型赋能区块链服务安全研究综述: 现状、挑战与机遇(特邀)[J]. 计算机工程, 2026, 52(1): 1-21.
[2]	刘荣龙, 李梓炜, 万悦, 吴嘉婧, 蒋子规. 面向Web3钓鱼网站的域名检测与网页分析方法[J]. 计算机工程, 2026, 52(1): 76-85.
[3]	常茹, 刘宇杰, 孙浩杰, 董立伟. 非线性多智能体系统事件触发预设性能编队控制[J]. 计算机工程, 2025, 51(9): 110-119.
[4]	刘根壕, 张能, 郑子彬. 基于大语言模型的API使用约束知识构建[J]. 计算机工程, 2025, 51(8): 74-85.
[5]	沈思彤, 王耀吾, 谢在鹏, 唐斌. 基于角色学习的多智能体强化学习方法[J]. 计算机工程, 2025, 51(6): 102-115.
[6]	吴凯峰, 刘磊, 刘晨, 梁成庆. 基于融合课程思想MADDPG的无人机编队控制[J]. 计算机工程, 2025, 51(5): 73-82.
[7]	吕超峰, 徐鹏飞, 罗迪, 刘金平. 基于多智能体深度强化学习的SD-IoT控制器部署[J]. 计算机工程, 2025, 51(5): 83-92.
[8]	王克文, 张维庭, 孙童. 空天地一体化算力网络资源调度机制[J]. 计算机工程, 2025, 51(5): 52-61.
[9]	曾建州, 李泽平, 张素勤. 基于TD3算法的多智能体协作缓存策略[J]. 计算机工程, 2025, 51(2): 365-374.
[10]	梁绪宁, 王思琪, 杨海龙, 栾钟治, 刘轶, 钱德沛. 基于自适应张量交换和重算的大模型推理优化[J]. 计算机工程, 2025, 51(10): 27-36.
[11]	罗焕坤, 葛一烽, 刘帅. 大语言模型在数学推理中的研究进展[J]. 计算机工程, 2024, 50(9): 1-17.
[12]	杨冬菊, 黄俊涛. 基于大语言模型的中文科技文献标注方法[J]. 计算机工程, 2024, 50(9): 113-120.
[13]	杨兴睿, 马斌, 李森垚, 钟忺. 基于大语言模型的教育文本幂等摘要方法[J]. 计算机工程, 2024, 50(7): 32-41.
[14]	翟洁, 李艳豪, 李彬彬, 郭卫斌. 基于大语言模型的个性化实验报告评语自动生成与应用[J]. 计算机工程, 2024, 50(7): 42-52.
[15]	孙文洁, 李宗民, 孙浩淼. 基于图神经网络的多智能体强化学习值函数分解方法[J]. 计算机工程, 2024, 50(5): 62-70.

选择文件类型/文献管理软件名称

选择包含的内容

基于大语言模型的多智能体系统异常综述(特邀)

A Review of Anomaly in Large Language Model-Based Multi-Agent Systems (Invited)

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 30

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于大语言模型的多智能体系统异常综述(特邀)

A Review of Anomaly in Large Language Model-Based Multi-Agent Systems (Invited)

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 30

相关文章 15

编辑推荐

Metrics

本文评价