作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (6): 1-16. doi: 10.19678/j.issn.1000-3428.0260356

• 前沿观点与综述 • 上一篇    下一篇

基于大模型的世界模型研究综述(特邀)

赵翔1,*(), 黑梦哲2, 李家旭1, 庞宁3, 陈子阳1   

  1. 1. 国防科技大学大数据与决策国家级重点实验室, 湖南 长沙 410005
    2. 国防科技大学信息系统工程国家重点实验室, 湖南 长沙 410005
    3. 空军航空大学, 吉林 长春 130000
  • 收稿日期:2026-03-20 修回日期:2026-04-22 出版日期:2026-06-15 发布日期:2026-06-02
  • 通讯作者: 赵翔
  • 作者简介:

    赵翔, 男, 教授、博士, 主研方向为大数据知识工程

    黑梦哲(共同一作), 男, 博士研究生, 主研方向为世界模型、热点事件态势预测

    李家旭, 博士研究生

    庞宁, 讲师、博士

    陈子阳, 博士研究生

  • 基金资助:
    国家自然科学基金(U25B2047); 国家自然科学基金(62272469); 国家自然科学基金(72501299)

Survey of World Models Based on Large Models (Invited)

ZHAO Xiang1,*(), HEI Mengzhe2, LI Jiaxu1, PANG Ning3, CHEN Ziyang1   

  1. 1. National Key Laboratory of Big Data and Decision, National University of Defense, Changsha 410005, Hunan, China
    2. National Key Laboratory of Information Systems Engineering, National University of Defense, Changsha 410005, Hunan, China
    3. Aviation University of Air Force, Changchun 130000, Jilin, China
  • Received:2026-03-20 Revised:2026-04-22 Online:2026-06-15 Published:2026-06-02
  • Contact: ZHAO Xiang

摘要:

一般认为, 世界模型理解并表示外部世界, 同时根据当前的世界状态和动作预测世界的未来状态。大模型依靠海量的训练数据和庞大的参数规模, 拥有出众的文本知识学习、理解表示和生成能力, 例如语言大模型GPT-4、LLaMA等。近年来, 世界模型研究备受工业界和学术界的关注, 涌现出了一大批包括自动驾驶、社会模拟、具身智能和视频生成的研究和商业成果, 并且研究者将各类大模型的出色成果应用在世界模型上, 使世界模型的效果得到了进一步提升。本文对利用大模型构建的各领域世界模型进行了全面综述, 包括基于语言大模型和基于视觉大模型(VLM), 并且选取了数个重要的应用领域对相关模型进行介绍, 包括具身智能、智慧城市、社会模拟和物理环境模拟。本文首先基于大模型的模态对世界模型进行分类, 指出了基于不同模态的世界模型在功能上的不同; 随后给出了世界模型重要的开源资源和基准, 帮助相关领域的研究人员快速了解和使用世界模型; 最后对文章进行总结, 并对未来研究方向进行展望。

关键词: 世界模型, 大模型, 生成式大模型, 模拟, 具身智能

Abstract:

World models are generally believed to understand and represent the external world and predict future states based on current world states and actions. Large models leverage massive training data and vast parameter scales to exhibit outstanding capabilities in learning, understanding, representing, and generating textual knowledge, as exemplified by language large models such as GPT-4 and LLaMA. In recent years, research on world models has attracted significant attention from both industry and academia, leading to significant research and commercial achievements in domains such as autonomous driving, social simulation, embodied intelligence, and video generation. Moreover, researchers have applied the remarkable results of various large models to world models, further enhancing their performance. This paper comprehensively reviews world models built using large models across different domains, covering both language large model- and Vision Large Model (VLM)-based approaches. Several important application areas, including embodied intelligence, smart cities, social simulation, and physical environment simulation, are selected to introduce relevant models. This paper classifies world models based on the modality of the large models used, highlighting the functional differences between world models based on different modalities. Subsequently, important open-source resources and benchmarks for world models are presented to help researchers in related fields understand and utilize world models quickly. Finally, this paper is summarized and future research directions are presented.

Key words: world model, large model, generative large model, simulation, embodied intelligence