作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (1): 22-32. doi: 10.19678/j.issn.1000-3428.0252754

• 大模型时代的服务计算 • 上一篇    下一篇

基于大语言模型的多智能体系统异常综述(特邀)

张珑耀1, 温东新1, 马庄宇1, 舒燕君1,*(), 李庆1,2, 刘明义1, 左德承1   

  1. 1. 哈尔滨工业大学计算学部, 黑龙江 哈尔滨 150001
    2. 江苏自动化研究所, 江苏 连云港 222006
  • 收稿日期:2025-07-14 修回日期:2025-10-13 出版日期:2026-01-15 发布日期:2025-11-26
  • 通讯作者: 舒燕君
  • 作者简介:

    张珑耀(CCF学生会员), 男, 硕士, 主研方向为边缘计算容错技术、多智能体系统异常分析

    温东新, 博士

    马庄宇, 博士

    舒燕君(通信作者), 博士

    李庆, 博士

    刘明义, 博士

    左德承, 博士

  • 基金资助:
    国家重点研发计划(2024YFB4506000); 国家自然科学基金(61202091); 国家自然科学基金(62171155)

A Review of Anomaly in Large Language Model-Based Multi-Agent Systems (Invited)

ZHANG Longyao1, WEN Dongxin1, MA Zhuangyu1, SHU Yanjun1,*(), LI Qing1,2, LIU Mingyi1, ZUO Decheng1   

  1. 1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
    2. Jiangsu Automation Research Institute, Lianyungang 222006, Jiangsu, China
  • Received:2025-07-14 Revised:2025-10-13 Online:2026-01-15 Published:2025-11-26
  • Contact: SHU Yanjun

摘要:

基于大语言模型(LLM)的多智能体系统(MAS)虽在处理复杂任务方面展现出巨大潜力, 但其分布式特性与交互不确定性易引发多样化异常, 威胁系统可靠性。为系统化识别并分类此种异常, 进行全面综述。研究选取7个代表性MAS及相应数据集, 收集13 418段运行轨迹, 采用LLM初步分析与专家人工校验相结合的方法进行数据分析。构建一个涵盖模型理解感知异常、智能体交互异常、任务执行异常和外部环境异常4个层级的细粒度异常分类框架, 并结合典型案例揭示各类异常产生的内在逻辑与外部诱因。统计分析显示: 模型理解感知异常占比最高, 其中"上下文幻觉"和"任务指令误解"是主要问题; 智能体交互异常占16.8%, "信息隐瞒"是主因; 任务执行异常占27.1%, 主要表现为"决策重复出错"; 外部环境异常占18.3%, 以"记忆冲突"为主。此外, 模型理解感知异常作为根源性诱因, 引发其他层级的异常, 凸显了提升模型基础能力的重要性。此分类和根源分析旨在为构建高可靠的基于LLM的MAS提供理论支撑与实践参考。

关键词: 大语言模型, 智能体, 多智能体系统, 异常统计, 异常分类

Abstract:

Large Language Model (LLM)-based Multi-Agent System (MAS) has demonstrated significant potential in handling complex tasks. Their distributed nature and interaction uncertainty can lead to diverse anomalies that threaten system reliability. This paper presents a comprehensive review, identifying and classifying these anomalies systematically. Seven representative multi-agent systems and their corresponding datasets are selected, accounting for 13 418 operational traces, and a hybrid data analysis method is employed, combining preliminary LLM analysis with expert manual validation. A fine-grained, four-level anomaly classification framework is constructed, encompassing the following anomalies: model understanding and perception, agent interaction, task execution, and external environment. Typical cases are analyzed to reveal the underlying logic and external causes of each type of anomaly. Statistical analysis indicates that model understanding and perception anomalies account for the highest proportion, with ″context hallucination″ and ″task instruction misunderstanding″ being the primary issues. Agent interaction anomalies represent 16.8%, primarily caused by ″information concealment″. Task execution anomalies account for 27.1%, mainly characterized by ″repetitive decision errors″. External environment anomalies account for 18.3%, with ″memory conflicts″ as the predominant factor. In addition, the model perception and understanding of anomalies often act as root causes, triggering anomalies at other levels, highlighting the importance of enhancing fundamental model capabilities. These classification and root cause analyses aim to provide theoretical support and practical reference for building highly reliable LLM-based multi-agent systems.

Key words: Large Language Model (LLM), agent, Multi-Agent System (MAS), anomaly statistics, anomaly classification