一种基于深度强化学习的自适应巡航控制算法

doi:10.19678/j.issn.1000-3428.0050994

计算机工程 ›› 2018, Vol. 44 ›› Issue (7): 32-35,41. doi: 10.19678/j.issn.1000-3428.0050994

所属专题：智能交通专题；

一种基于深度强化学习的自适应巡航控制算法

韩向敏,鲍泓,梁军,潘峰,玄祖兴

北京联合大学北京信息服务工程重点实验室,北京 100101

收稿日期:2018-03-29 出版日期:2018-07-15 发布日期:2018-07-15
作者简介:韩向敏(1991—),男,硕士研究生,主研方向为智能驾驶控制算法;鲍泓、梁军,教授;潘峰、玄祖兴,副教授。
基金资助:
国家自然科学基金“视听觉信息的认知计算”重大研究计划重点支持项目“智能车驾驶脑认知技术、平台与转化研究”(91420202);英国皇家工程院牛顿基金(UK-CIAPP/324);北京市属高校高水平教师队伍建设支持计划项目(IDHT20170511);北京市教委科研计划项目 (KM201811417006)。

An Adaptive Cruise Control Algorithm Based on Deep Reinforcement Learning

HAN Xiangmin,BAO Hong,LIANG Jun,PAN Feng,XUAN Zuxing

Beijing Key Laboratory of Information Service Engineering,Beijing Union University,Beijing 100101,China

Received:2018-03-29 Online:2018-07-15 Published:2018-07-15

摘要/Abstract

摘要：

自适应巡航控制是智能驾驶领域的核心技术,可通过分层控制或参数可变控制算法实现,但这些算法无法有效应对突发的跟车路况。为此,将深度强化学习与自适应巡航控制相结合,提出基于确定性策略梯度算法的自适应巡航控制算法,使智能车辆可以在自学习过程中完成自适应巡航并不断改进。在开源平台上的测试结果表明,该算法可以使智能驾驶车辆在跟车时加速度保持在1.8 m/s2以内的比例超过90%,达到人类驾驶员的巡航跟车水平。

关键词: 智能驾驶, 自动控制, 自适应巡航控制, 深度强化学习, 确定性策略梯度算法

Abstract:

Adaptive Cruise Control(ACC) is one of the most core technologies in the field of smart driving.Researchers mostly use traditional hierarchical control methods or variable control algorithms to implement this technology.These algorithms can not respond effectively to unexpected follow-up road conditions.For this reason,this paper combines deep reinforcement learning with ACC,and proposes an ACC algorithm based on deterministic strategy gradient algorithm,so that the intelligent vehicle can complete adaptive cruise and continue to improve in the continuous self-learning process.The test results under the open source platform show that this algorithm can make the ratio of the acceleration of the smart driving vehicle within 1.8 m/s2 within 90% of the follow-up acceleration,which can reach the level of the cruise control of the human pilot.

Key words: smart driving, automatic control, Adaptive Cruise Control(ACC), deep reinforcement learning, deterministic strategy gradient algorithm

中图分类号:

TP18

韩向敏,鲍泓,梁军,潘峰,玄祖兴. 一种基于深度强化学习的自适应巡航控制算法[J]. 计算机工程, 2018, 44(7): 32-35,41.

HAN Xiangmin,BAO Hong,LIANG Jun,PAN Feng,XUAN Zuxing. An Adaptive Cruise Control Algorithm Based on Deep Reinforcement Learning[J]. Computer Engineering, 2018, 44(7): 32-35,41.

http://www.ecice06.com/CN/Y2018/V44/I7/32

参考文献

［1］王景武,金立生.车辆自适应巡航控制系统控制技术的发展［J］.汽车技术,2004(7):1-4. ［2］陆涛,刘箴,刘婷婷,等.基于跟驰模型的车辆虚拟仿真方法［J］.计算机工程,2016,42(6):305-309. ［3］徐洪智,李仁发,曾理宁.基于Ptolemy 的自适应巡航系统建模与仿真［J］.计算机工程,2015,41(6):28-32. ［4］LEE N,CHOI W,VERNAZA P,et al.DESIRE:distant future prediction in dynamic scenes with interacting agents［C］//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Computer Society,2017:2165-2174. ［5］马国成.车辆自适应巡航跟随控制技术研究［D］.北京:北京理工大学,2014. ［6］MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning［J］.Nature,2015,518(7540):529. (下转第41页) (上接第35页) ［7］SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge［J］.Nature,2017,550(7676):354-359. ［8］LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning［J］.Computer Science,2015,8(6). ［9］RDULESCU R,VRANCX P,NOW A.Analysing congestion problems in multi-agent reinforcement learning［C］//Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems.［S.l.］:International Foundation for Autonomous Agents and Multiagent Systems,2017:1705-1707. ［10］赵冬斌,邵坤,朱圆恒,等.深度强化学习综述:兼论计算机围棋的发展［J］.控制理论与应用,2016,33(6):701-717. ［11］张德兆,王建强,刘佳熙,等.加速度连续型自适应巡航控制模式切换策略［J］.清华大学学报(自然科学版),2010,50(8):1277-1281. ［12］凌滨,宋梦实.汽车自适应巡航系统车距控制策略研究［J］.计算机仿真,2017,34(11):143-148. ［13］MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning［EB/OL］.［2017-12-20］.https://arxiv.org/pdf/1312.5602v1.pdf. ［14］SERNA C G,RUICHEK Y.Dynamic speed adaptation for path tracking based on curvature information and speed limits［J］.Sensors,2017,17(6):1383. ［15］THRUN S,MONTEMERLO M,DAHLKAMP H,et al.Stanley:the robot that won the DARPA grand chal-lenge［J］.Journal of Field Robotics,2006,23(9):661-692. ［16］HAN X,BAO H,XUAN Z,et al.A predictive control algorithm based on driving behavior data model［C］//Proceedings of International Conference on Computational Intelligence and Security.Washington D.C.,USA:IEEE Computer Society,2017:390-394.

[1]	张冠莹, 伊鹏, 李丹, 朱棣, 毛明. 面向大规模网络的服务功能链部署方法[J]. 计算机工程, 2023, 49(8): 122-129.
[2]	梅晶, 戴龙宝, 童钊, 邓昕, 王嘉珂. 资源约束下基于Lyapunov优化的自适应卸载算法[J]. 计算机工程, 2023, 49(7): 34-46.
[3]	蔡丽娇, 秦进, 陈双. 远离旧区域和避免回路的强化探索方法[J]. 计算机工程, 2023, 49(7): 118-124.
[4]	李强, 仪晋辉, 杜婷婷, 王胜春. 移动边缘计算中基于A3C的依赖任务卸载与资源分配[J]. 计算机工程, 2023, 49(6): 42-52.
[5]	饶东宁, 罗南岳. 基于多任务强化学习的堆垛机调度与库位推荐[J]. 计算机工程, 2023, 49(2): 279-287,295.
[6]	宋健, 王子磊. 基于值分解的多目标多智能体深度强化学习方法[J]. 计算机工程, 2023, 49(1): 31-40.
[7]	赵寅甫, 冯正勇. 基于深度强化学习的机械臂控制快速训练方法[J]. 计算机工程, 2022, 48(8): 113-120.
[8]	厉子凡, 王浩, 方宝富. 一种基于多步竞争网络的多智能体协作方法[J]. 计算机工程, 2022, 48(5): 74-81.
[9]	于晶, 鲁凌云, 李翔. 车联网中基于DDQN的边云协作任务卸载机制[J]. 计算机工程, 2022, 48(12): 156-164.
[10]	刘先锋, 梁赛, 李强, 张锦. 基于深度强化学习的云边协同DNN推理[J]. 计算机工程, 2022, 48(11): 30-38.
[11]	杨文琦, 章阳, 聂江天, 杨和林, 康嘉文, 熊泽辉. 基于联邦学习的无线网络节点能量与信息管理策略[J]. 计算机工程, 2022, 48(1): 188-196,203.
[12]	杨天, 杨军. MEC中卸载决策与资源分配的深度强化学习方法[J]. 计算机工程, 2021, 47(8): 37-44.
[13]	谭嵋, 刘士豪, 周婉, 陈国文, 胡学敏. 基于深度时空Q网络的机器人疏散人群算法[J]. 计算机工程, 2021, 47(6): 305-311.
[14]	邱月, 郑柏通, 蔡超. 多约束复杂环境下UAV航迹规划策略自学习方法[J]. 计算机工程, 2021, 47(5): 44-51.
[15]	张鹏, 陈博. 基于图神经网络的智能路由机制[J]. 计算机工程, 2021, 47(12): 171-176,184.

选择文件类型/文献管理软件名称

选择包含的内容

一种基于深度强化学习的自适应巡航控制算法

An Adaptive Cruise Control Algorithm Based on Deep Reinforcement Learning

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

一种基于深度强化学习的自适应巡航控制算法

An Adaptive Cruise Control Algorithm Based on Deep Reinforcement Learning

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价