基于云端可视化交互的强化学习平台

doi:10.19678/j.issn.1000-3428.0057693

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 316-320. doi: 10.19678/j.issn.1000-3428.0057693

• 开发研究与工程应用 • 上一篇

基于云端可视化交互的强化学习平台

姚铁锤^1,2, 王珏^1,2, 王彦棡^1,2, 迟学斌^1,2, 王晓光¹

1. 中国科学院计算机网络信息中心, 北京 100190;
2. 中国科学院大学计算机科学与技术学院, 北京 100049

收稿日期:2020-03-12 修回日期:2020-05-13 发布日期:2020-05-21
作者简介:姚铁锤(1993-),男,博士研究生,主研方向为强化学习、高性能计算;王珏(通信作者),副研究员;王彦棡、迟学斌,研究员、博士生导师;王晓光,工程师。
基金资助:
国家重点研发计划“大规模并行计算的工具库和领域相关基础软件包”（2017YFB0202202）；“中国科技云”建设工程（二期）项目“超算资源池建设”（XXH13503）；国家电网有限公司总部科技项目“电力人工智能实验及公共服务平台技术”（SGGR0000JSJS1800569）。

Reinforcement Learning Platform Based on Cloud Visual Interaction

YAO Tiechui^1,2, WANG Jue^1,2, WANG Yangang^1,2, CHI Xuebin^1,2, WANG Xiaoguang¹

1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;
2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China

Received:2020-03-12 Revised:2020-05-13 Published:2020-05-21

摘要/Abstract

摘要： 强化学习是一个与环境交互的学习过程，在实验场景中，训练环境部署的可扩展性和算法验证的便捷性常受限于物理引擎和渲染模块的高耦合性。为对物理引擎和渲染模块进行解耦，构建一种面向物理引擎和渲染模块的云端交互式模型，其中包括操作字典、元素字典和对应的算法接口，并基于该模型实现模拟器。通过集成模拟器、可视化工具和知识管理等组件，搭建支持云端可视化交互的强化学习平台，并以MuJoCo物理引擎为例，验证Web模拟器接入自定义物理引擎的便捷性。实验和分析结果验证了该模型的有效性，其可方便接入平台，实现云端渲染并提高所属集群的利用率。

关键词: 强化学习平台, 物理引擎, 渲染模块, 云端可视化交互, 接口标准

Abstract: Reinforcement learning is a learning process that interacts with the environment.In the experiment environment,the scalability of the training environment deployment and the convenience of algorithm verification are often limited by the high coupling between the physics engine and the rendering module.To solve the problem,this paper proposes a Cloud Interactive Model(CIM) for physics engine and rendering module,which consists of an operation dictionary,element dictionary and relevant algorithm interfaces,and on this basis implements a simulator.Furthermore, this paper integrates the simulator,visualization tools,knowledge management and other components to build a Reinforcement Learning Platform(RLP) supporting cloud visual interaction.By taking the MuJoCo physics engine as an example,the Web simulator is verified for its convenience of access to a custom physics engine.Experimental and analytical results show that this model can be conveniently connected to the platform to realize cloud rendering and improve the utilization rate of its cluster.

Key words: Reinforcement Learning Platform(RLP), physics engine, rendering module, cloud visual interaction, interface standard

中图分类号:

TP391

姚铁锤, 王珏, 王彦棡, 迟学斌, 王晓光. 基于云端可视化交互的强化学习平台[J]. 计算机工程, 2021, 47(5): 316-320.

YAO Tiechui, WANG Jue, WANG Yangang, CHI Xuebin, WANG Xiaoguang. Reinforcement Learning Platform Based on Cloud Visual Interaction[J]. Computer Engineering, 2021, 47(5): 316-320.

http://www.ecice06.com/CN/Y2021/V47/I5/316

参考文献

[1] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[2] VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft Ⅱ using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[3] WEN Kaige,YANG Zhaohui.Intersection signal control based on reinforcement learning with CMAC[J].Computer Engineering,2011,37(17):152-154.(in Chinese)温凯歌,杨照辉.基于CMAC强化学习的交叉口信号控制[J].计算机工程,2011,37(17):152-154.
[4] BROCKMAN G,CHEUNG V,PETTERSSON L,et al.OpenAI Gym[EB/OL].(2016-06-05)[2020-02-20].https://arxiv.org/pdf/1606.01540.pdf.
[5] TODOROV E,EREZ T,TASSA Y.MuJoCo:a physics engine for model-based control[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C.,USA:IEEE Press,2012:5026-5033.
[6] TASSA Y,DORON Y,MULDAL A,et al.DeepMind control suite[EB/OL].(2018-01-02)[2020-02-20].https://arxiv.org/pdf/1801.00690.pdf.
[7] WYMANN B,ESPIÉ E,GUIONNEAU C,et al.TORCS,the open racing car simulator[EB/OL].(2015-03-12)[2020-02-20].http://www.cse.chalmers.se/~chrdimi/papers/torcs.pdf.
[8] DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]//Proceedings of International Conference on Machine Learning.New York,USA:[s.n.],2016:1329-1338.
[9] BEATTIE C,LEIBO J Z,TEPLYASHIN D,et al.DeepMind lab[EB/OL].(2016-12-12)[2020-02-20].https://arxiv.org/pdf/1612.03801.pdf.
[10] COUMANS E,BAI Y.PyBullet,a python module for physics simulation for games,robotics and machine learning[EB/OL].(2016-01-13)[2020-02-20].https://github.com/bulletphysics/bullet3.
[11] VINYALS O,EWALDS T,BARTUNOV S,et al.StarCraft Ⅱ:a new challenge for reinforcement learning[EB/OL].(2017-08-16)[2020-02-20].https://arxiv.org/pdf/1708. 04782.pdf.
[12] VINITSKY E,KREIDIEH A,FLEM L,et al.Benchmarks for reinforcement learning in mixed-autonomy traffic[C]//Proceedings of Conference on Robot Learning.Zurich,Switzerland:[s.n.],2018:399-409.
[13] EREZ T,TASSA Y,TODOROV E.Simulation tools for model-based robotics:comparison of Bullet,Havok,MuJoCo,ODE and PhysX[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C.,USA:IEEE Press,2015:4397-4404.
[14] DASARI S,EBERT F,TIAN S,et al.RoboNet:large-scale multi-robot learning[EB/OL].(2019-10-24)[2020-02-20]. https://arxiv.org/pdf/1910.11215.pdf.
[15] MANDLEKAR A,ZHU Y,GARG A,et al.RoboTurk:a crowdsourcing platform for robotic skill learning through imitation[EB/OL].(2018-11-07)[2020-02-20].https://arxiv.org/pdf/1811.02790.pdf.
[16] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[17] POZZI M,PRATTICHIZZO D,MALVEZZI M.On-line educational resources on robotics:a review[C]//Proceedings of International Conference on Inclusive Robotics for a Better Society.Pisa,Italy:[s.n.],2018:141-147.
[18] Aliyun.Platform of artificial intelligence[EB/OL].[2020-02-20].https://help.aliyun.com/document_detail/114522.html?spm=5176.12674308.1334604.2113pai798a73dboAt17A.
[19] TencentCloud.TI-ONE[EB/OL].[2020-02-20].https://cloud.tencent.com/document/product/851/39399.
[20] Baidu.AI studio[EB/OL].[2020-02-20].https://ai.baidu. com/ai-doc/AISTUDIO/Tk39ty6ho.
[21] LU Zhonghua,HU Tengteng,WANG Yangang,et al.The design and implement of HPC based on Slurm for deep learning[J].E-science Technology & Application,2018,9(2):40-45.(in Chinese)陆忠华,胡腾腾,王彦棡,等.基于Slurm的深度学习高性能计算平台设计及其调度实现技术[J].科研信息化技术与应用,2018,9(2):40-45.
[22] CONGOTE J,SEGURA A,KABONGO L,et al.Interactive visualization of volumetric data with WebGL in real-time[C]//Proceedings of the 16th International Conference on 3D Web Technology.Anaheim,USA:[s.n.],2011:137-146.
[23] SEFRAOUI O,AISSAOUI M,ELEULDJ M.OpenStack:toward an open-source solution for cloud computing[J].International Journal of Computer Applications,2012,55(3):38-42.

选择文件类型/文献管理软件名称

选择包含的内容

基于云端可视化交互的强化学习平台

Reinforcement Learning Platform Based on Cloud Visual Interaction

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨文忠, 丁甜甜, 康鹏, 卜文秀. 基于舆情新闻的中文关键词抽取综述[J]. 计算机工程, 2023, 49(3): 1-17.
[2]	孙伟, 常鹏帅, 戴亮, 张小瑞, 陈旋, 代广昭. 基于注意力引导数据增强的车型识别[J]. 计算机工程, 2022, 48(7): 300-306.
[3]	李晨, 侯进, 李金彪, 陈子锐. 基于注意力与残差级联的红外与可见光图像融合方法[J]. 计算机工程, 2022, 48(7): 234-240.
[4]	张瑷涵, 刘翔, 石蕴玉, 刘思齐. 基于深度学习的双流程短视频分类方法[J]. 计算机工程, 2022, 48(7): 277-283.
[5]	魏紫薇, 屈丹, 柳聪. 基于连接注意力的行人重识别特征提取方法[J]. 计算机工程, 2022, 48(7): 220-226.
[6]	张业星, 陈敏, 潘秋羽. 基于特征通道建模的目标检测方法[J]. 计算机工程, 2022, 48(7): 264-269,299.
[7]	郝阿香, 贾郭军. 结合注意力与批特征擦除的行人重识别模型[J]. 计算机工程, 2022, 48(7): 270-276,306.
[8]	崔云轩, 刘桂华, 余东应, 郭中远, 张文凯. 点线特征融合的激光雷达单目惯导SLAM系统[J]. 计算机工程, 2022, 48(7): 254-263.
[9]	黄金瑶, 刘同来, 吴嘉鑫, 武继刚. 多周期家庭护理的路径规划与调度算法[J]. 计算机工程, 2022, 48(7): 292-299.
[10]	黎浩民, 李光平. 基于稀疏神经网络的图像超分辨率重建算法[J]. 计算机工程, 2022, 48(7): 247-253.
[11]	朱凌, 王雅萍, 廖丽敏. 基于共现流增强双向金字塔卷积网络的密集液滴识别[J]. 计算机工程, 2022, 48(7): 241-246,253.
[12]	王晞阳, 陈继林, 李猛, 刘首文. FPGA架构上面向稀疏矩阵求解的静态调度算法[J]. 计算机工程, 2022, 48(7): 199-205,213.
[13]	臧迪, 杨志刚, 王晶, 姚治成, 张伟功. 基于网卡虚拟化的高性能容器网络设计[J]. 计算机工程, 2022, 48(7): 214-219.
[14]	奚智雯, 蔡晶晶, 阳文敏, 柴志雷. 基于微服务架构FPGA云平台的并发请求调度机制[J]. 计算机工程, 2022, 48(7): 206-213.
[15]	白杰, 张赛, 李艳萍. 基于改进交错组卷积的眼底硬性渗出物自动分割[J]. 计算机工程, 2022, 48(7): 307-314.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于云端可视化交互的强化学习平台

Reinforcement Learning Platform Based on Cloud Visual Interaction

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价