作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (12): 19-29. doi: 10.19678/j.issn.1000-3428.0061116

• 热点与综述 • 上一篇    下一篇

深度强化学习研究综述

杨思明1, 单征1, 丁煜2, 李刚伟3   

  1. 1. 数学工程与先进计算国家重点实验室, 郑州 450001;
    2. 中国人民解放军94162部队, 西安 710600;
    3. 中国人民解放军78100部队, 成都 610031
  • 收稿日期:2021-03-12 修回日期:2021-05-15 发布日期:2021-05-24
  • 作者简介:杨思明(1994-),男,硕士研究生,主研方向为深度学习、强化学习;单征,教授;丁煜,学士;李刚伟,硕士研究生。
  • 基金资助:
    国家自然科学基金(61971092,61701503)。

Survey of Research on Deep Reinforcement Learning

YANG Siming1, SHAN Zheng1, DING Yu2, LI Gangwei3   

  1. 1. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China;
    2. 94162 Troops of PLA, Xi'an 710600, China;
    3. 78100 Troops of PLA, Chengdu 610031, China
  • Received:2021-03-12 Revised:2021-05-15 Published:2021-05-24

摘要: 深度强化学习是指利用深度神经网络的特征表示能力对强化学习的状态、动作、价值等函数进行拟合,以提升强化学习模型性能,广泛应用于电子游戏、机械控制、推荐系统、金融投资等领域。回顾深度强化学习方法的主要发展历程,根据当前研究目标对深度强化学习方法进行分类,分析与讨论高维状态动作空间任务上的算法收敛、复杂应用场景下的算法样本效率提高、奖励函数稀疏或无明确定义情况下的算法探索以及多任务场景下的算法泛化性能增强问题,总结与归纳4类深度强化学习方法的研究现状,同时针对深度强化学习技术的未来发展方向进行展望。

关键词: 深度学习, 强化学习, 深度强化学习, 逆向强化学习, 基于模型的元学习

Abstract: Deep Reinforcement Learning(DRL) refers to using feature representation capabilities of deep neural networks to fit Reinforcement Learning(RL) functions, including the state, action, and value, so the performance of RL models can be improved.It has been widely used in video games, mechanical control, recommendation system, financial investment and other fields.This article reviews the development history of DRL methods, and categorizes them based on the existing research goals.Then the article analyzes the algorithm convergence problem in high-dimensional state action space tasks, problem of improving sampling efficiency of the algorithms in the complex application scenarios, the algorithm exploration problem in the complex scenarios where the reward functions are sparse or inexplicitly defined, and the problem of enhancing the generalization ability of the algorithm in the multitasking scenarios.Finally, the article summarizes the current development of the four kinds of DRL methods, and discusses the future development trends of DRL technology.

Key words: Deep Learning(DL), Reinforcement Learning(RL), Deep Reinforcement Learning(DRL), Inverse Reinforcement Learning(IRL), Model-Based Meta-Learning(MBML)

中图分类号: