Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (2): 1-6. doi: 10.19678/j.issn.1000-3428.0253281

• Frontier Perspectives and Reviews • Previous Articles    

Evolution of Large Model Technologies: World Models Drive Artificial Intelligence from Perception to Decision-Making (Invited)

WANG Limin, ZHU Guanghui, WU Tao   

  1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, Jiangsu, China
  • Received:2025-11-20 Revised:2025-12-18 Published:2026-02-04

大模型技术演进:世界模型让人工智能从感知走向决策(特邀)

王利民, 朱光辉, 吴涛   

  1. 南京大学计算机软件新技术全国重点实验室, 江苏 南京 210023
  • 作者简介:王利民(CCF高级会员),男,教授、博士,主研方向为计算机视觉、多模态大模型,E-mail:lmwang@nju.edu.cn;朱光辉(CCF高级会员),准聘助理教授、博士;吴涛,博士研究生。E-mail:tianwang@bnu.edu.cn
  • 基金资助:
    科技创新2030—"新一代人工智能"重大项目(2022ZD0160900);江苏省自然科学基金攀登项目(BK20250009)。

Abstract: Large Language Models (LLMs) have propelled artificial intelligence into an era of natural language-centric interaction; however, they remain significantly limited in terms of physical world modeling and complex decision-making. To address these limitations, this paper considers the world model as its core paradigm and systematically analyzes the key technical pathways for the evolution of LLMs into decision-making agents. First, the capability boundaries of LLMs are delineated, highlighting their intrinsic limitations in structured knowledge representation, real-world perception, and applications that require high reliability. Subsequently, the core essence and key characteristics of world models are summarized in terms of dynamic prediction, task-driven selective modeling, multimodal fusion, and physical consistency. Building on this, data-driven generative modeling and physics-prior-driven simulation modeling are systematically reviewed and compared. Additionally, common technical challenges, including acquisition of high-quality interactive data, long-term prediction consistency, unified multimodal representation, and real-time inference efficiency, are analyzed. Furthermore, the potential and limitations of world models in bridging common-sense gaps, enhancing planning and decision-making capabilities, and supporting embodied intelligence on the path toward Artificial General Intelligence (AGI) are discussed. Finally, considering current technological trends, a forward-looking perspective on future research directions, including LLM-world model integration, data and algorithm co-optimization, fusion of physics priors with generative modeling, tight integration with embodied intelligence, and ethical and safety governance, is provided. This paper systematically analyzes the current status and future development of world-model technologies and provides theoretical and practical guidance for advancing artificial intelligence from perception-to decision-driven capabilities.

Key words: world model, Large Language Model (LLM), Artificial General Intelligence (AGI), multimodal perception, decision-making and planning, embodied intelligence

摘要: 大语言模型推动了人工智能进入以自然语言为核心的交互时代,但其在物理世界建模与复杂任务决策方面仍存在显著能力瓶颈。针对上述问题,以世界模型为核心范式,系统分析了大语言模型向决策型智能体演进的关键技术路径。首先界定了大语言模型的能力边界,阐明其在知识结构化表征、现实世界感知及高可靠性应用中的内在局限;其次从动态预测、任务驱动的选择性建模、多模态融合与物理一致性等方面,归纳了世界模型的核心内涵与关键特征;再次对数据驱动的生成式建模路径与物理先验驱动的仿真建模路径进行了系统梳理与对比,分析了世界模型在高质量交互数据获取、长期预测一致性、多模态表征统一及实时推理效率等方面面临的共性技术挑战;随后从弥补常识缺失、增强规划与决策能力及支撑具身智能发展的角度,讨论了世界模型在通向通用人工智能过程中的作用与现实局限;最后结合技术发展趋势,从大语言模型与世界模型协同融合、数据与算法协同优化、物理先验与生成式建模结合、具身智能深度耦合以及伦理与安全治理等方面,对世界模型的未来研究方向进行了系统展望。通过对世界模型技术现状的系统性分析与发展方向的前瞻性探讨,为人工智能从感知向决策的演进提供了理论与实践参考。

关键词: 世界模型, 大语言模型, 通用人工智能, 多模态感知, 决策规划, 具身智能

CLC Number: