基于云端可视化交互的强化学习平台

doi:10.19678/j.issn.1000-3428.0057693

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 316-320. doi: 10.19678/j.issn.1000-3428.0057693

• 开发研究与工程应用 • 上一篇

基于云端可视化交互的强化学习平台

姚铁锤^1,2, 王珏^1,2, 王彦棡^1,2, 迟学斌^1,2, 王晓光¹

1. 中国科学院计算机网络信息中心, 北京 100190;
2. 中国科学院大学计算机科学与技术学院, 北京 100049

收稿日期:2020-03-12 修回日期:2020-05-13 发布日期:2020-05-21
作者简介:姚铁锤(1993-),男,博士研究生,主研方向为强化学习、高性能计算;王珏(通信作者),副研究员;王彦棡、迟学斌,研究员、博士生导师;王晓光,工程师。
基金资助:
国家重点研发计划“大规模并行计算的工具库和领域相关基础软件包”（2017YFB0202202）；“中国科技云”建设工程（二期）项目“超算资源池建设”（XXH13503）；国家电网有限公司总部科技项目“电力人工智能实验及公共服务平台技术”（SGGR0000JSJS1800569）。

Reinforcement Learning Platform Based on Cloud Visual Interaction

YAO Tiechui^1,2, WANG Jue^1,2, WANG Yangang^1,2, CHI Xuebin^1,2, WANG Xiaoguang¹

1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;
2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China

Received:2020-03-12 Revised:2020-05-13 Published:2020-05-21

摘要/Abstract

摘要： 强化学习是一个与环境交互的学习过程，在实验场景中，训练环境部署的可扩展性和算法验证的便捷性常受限于物理引擎和渲染模块的高耦合性。为对物理引擎和渲染模块进行解耦，构建一种面向物理引擎和渲染模块的云端交互式模型，其中包括操作字典、元素字典和对应的算法接口，并基于该模型实现模拟器。通过集成模拟器、可视化工具和知识管理等组件，搭建支持云端可视化交互的强化学习平台，并以MuJoCo物理引擎为例，验证Web模拟器接入自定义物理引擎的便捷性。实验和分析结果验证了该模型的有效性，其可方便接入平台，实现云端渲染并提高所属集群的利用率。

关键词: 强化学习平台, 物理引擎, 渲染模块, 云端可视化交互, 接口标准

Abstract: Reinforcement learning is a learning process that interacts with the environment.In the experiment environment,the scalability of the training environment deployment and the convenience of algorithm verification are often limited by the high coupling between the physics engine and the rendering module.To solve the problem,this paper proposes a Cloud Interactive Model(CIM) for physics engine and rendering module,which consists of an operation dictionary,element dictionary and relevant algorithm interfaces,and on this basis implements a simulator.Furthermore, this paper integrates the simulator,visualization tools,knowledge management and other components to build a Reinforcement Learning Platform(RLP) supporting cloud visual interaction.By taking the MuJoCo physics engine as an example,the Web simulator is verified for its convenience of access to a custom physics engine.Experimental and analytical results show that this model can be conveniently connected to the platform to realize cloud rendering and improve the utilization rate of its cluster.

Key words: Reinforcement Learning Platform(RLP), physics engine, rendering module, cloud visual interaction, interface standard

中图分类号:

TP391

姚铁锤, 王珏, 王彦棡, 迟学斌, 王晓光. 基于云端可视化交互的强化学习平台[J]. 计算机工程, 2021, 47(5): 316-320.

YAO Tiechui, WANG Jue, WANG Yangang, CHI Xuebin, WANG Xiaoguang. Reinforcement Learning Platform Based on Cloud Visual Interaction[J]. Computer Engineering, 2021, 47(5): 316-320.

https://www.ecice06.com/CN/Y2021/V47/I5/316

参考文献

[1] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[2] VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft Ⅱ using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[3] WEN Kaige,YANG Zhaohui.Intersection signal control based on reinforcement learning with CMAC[J].Computer Engineering,2011,37(17):152-154.(in Chinese)温凯歌,杨照辉.基于CMAC强化学习的交叉口信号控制[J].计算机工程,2011,37(17):152-154.
[4] BROCKMAN G,CHEUNG V,PETTERSSON L,et al.OpenAI Gym[EB/OL].(2016-06-05)[2020-02-20].https://arxiv.org/pdf/1606.01540.pdf.
[5] TODOROV E,EREZ T,TASSA Y.MuJoCo:a physics engine for model-based control[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C.,USA:IEEE Press,2012:5026-5033.
[6] TASSA Y,DORON Y,MULDAL A,et al.DeepMind control suite[EB/OL].(2018-01-02)[2020-02-20].https://arxiv.org/pdf/1801.00690.pdf.
[7] WYMANN B,ESPIÉ E,GUIONNEAU C,et al.TORCS,the open racing car simulator[EB/OL].(2015-03-12)[2020-02-20].http://www.cse.chalmers.se/~chrdimi/papers/torcs.pdf.
[8] DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]//Proceedings of International Conference on Machine Learning.New York,USA:[s.n.],2016:1329-1338.
[9] BEATTIE C,LEIBO J Z,TEPLYASHIN D,et al.DeepMind lab[EB/OL].(2016-12-12)[2020-02-20].https://arxiv.org/pdf/1612.03801.pdf.
[10] COUMANS E,BAI Y.PyBullet,a python module for physics simulation for games,robotics and machine learning[EB/OL].(2016-01-13)[2020-02-20].https://github.com/bulletphysics/bullet3.
[11] VINYALS O,EWALDS T,BARTUNOV S,et al.StarCraft Ⅱ:a new challenge for reinforcement learning[EB/OL].(2017-08-16)[2020-02-20].https://arxiv.org/pdf/1708. 04782.pdf.
[12] VINITSKY E,KREIDIEH A,FLEM L,et al.Benchmarks for reinforcement learning in mixed-autonomy traffic[C]//Proceedings of Conference on Robot Learning.Zurich,Switzerland:[s.n.],2018:399-409.
[13] EREZ T,TASSA Y,TODOROV E.Simulation tools for model-based robotics:comparison of Bullet,Havok,MuJoCo,ODE and PhysX[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C.,USA:IEEE Press,2015:4397-4404.
[14] DASARI S,EBERT F,TIAN S,et al.RoboNet:large-scale multi-robot learning[EB/OL].(2019-10-24)[2020-02-20]. https://arxiv.org/pdf/1910.11215.pdf.
[15] MANDLEKAR A,ZHU Y,GARG A,et al.RoboTurk:a crowdsourcing platform for robotic skill learning through imitation[EB/OL].(2018-11-07)[2020-02-20].https://arxiv.org/pdf/1811.02790.pdf.
[16] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[17] POZZI M,PRATTICHIZZO D,MALVEZZI M.On-line educational resources on robotics:a review[C]//Proceedings of International Conference on Inclusive Robotics for a Better Society.Pisa,Italy:[s.n.],2018:141-147.
[18] Aliyun.Platform of artificial intelligence[EB/OL].[2020-02-20].https://help.aliyun.com/document_detail/114522.html?spm=5176.12674308.1334604.2113pai798a73dboAt17A.
[19] TencentCloud.TI-ONE[EB/OL].[2020-02-20].https://cloud.tencent.com/document/product/851/39399.
[20] Baidu.AI studio[EB/OL].[2020-02-20].https://ai.baidu. com/ai-doc/AISTUDIO/Tk39ty6ho.
[21] LU Zhonghua,HU Tengteng,WANG Yangang,et al.The design and implement of HPC based on Slurm for deep learning[J].E-science Technology & Application,2018,9(2):40-45.(in Chinese)陆忠华,胡腾腾,王彦棡,等.基于Slurm的深度学习高性能计算平台设计及其调度实现技术[J].科研信息化技术与应用,2018,9(2):40-45.
[22] CONGOTE J,SEGURA A,KABONGO L,et al.Interactive visualization of volumetric data with WebGL in real-time[C]//Proceedings of the 16th International Conference on 3D Web Technology.Anaheim,USA:[s.n.],2011:137-146.
[23] SEFRAOUI O,AISSAOUI M,ELEULDJ M.OpenStack:toward an open-source solution for cloud computing[J].International Journal of Computer Applications,2012,55(3):38-42.

选择文件类型/文献管理软件名称

选择包含的内容

基于云端可视化交互的强化学习平台

Reinforcement Learning Platform Based on Cloud Visual Interaction

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李子杰, 周菊香, 韩晓瑜, 甘健侯, 鹿泽光, 王俊. 序列特征与学习过程融合的知识追踪模型[J]. 计算机工程, 2024, 50(6): 77-85.
[2]	更藏措毛, 黄鹤鸣, 杨毅杰. 融合多尺度特征与上下文信息的语音增强方法[J]. 计算机工程, 2024, 50(6): 138-147.
[3]	傅明建, 郭福强. 基于深度强化学习的无信号灯路口决策研究[J]. 计算机工程, 2024, 50(5): 91-99.
[4]	代巍, 王丰羽, 冀常鹏. 基于情感增强与双图卷积网络的方面级情感分析[J]. 计算机工程, 2024, 50(5): 120-127.
[5]	隗昊, 刁宏悦, 孔亮宸, 邓耀臣. 东北亚舆情文本细粒度命名实体识别方法研究[J]. 计算机工程, 2024, 50(5): 354-362.
[6]	杨文忠, 丁甜甜, 康鹏, 卜文秀. 基于舆情新闻的中文关键词抽取综述[J]. 计算机工程, 2023, 49(3): 1-17.
[7]	李柯泉, 陈燕, 刘佳晨, 牟向伟. 基于深度学习的目标检测算法综述[J]. 计算机工程, 2022, 48(7): 1-12.
[8]	杜清华, 张凯. 一种高效的跨平台工作流优化方法[J]. 计算机工程, 2022, 48(7): 13-21,28.
[9]	叶茂, 马杰, 王倩, 武麟. 多尺度特征融合的轻量化口罩佩戴检测算法[J]. 计算机工程, 2022, 48(7): 42-50.
[10]	普瑞丽, 王元龙, 李茹. 融合因果关系表征的阅读理解因果关系类选项判断[J]. 计算机工程, 2022, 48(7): 89-96.
[11]	艾成豪, 高建华, 黄子杰. 混合特征选择和集成学习驱动的代码异味检测[J]. 计算机工程, 2022, 48(7): 168-176,198.
[12]	王晞阳, 陈继林, 李猛, 刘首文. FPGA架构上面向稀疏矩阵求解的静态调度算法[J]. 计算机工程, 2022, 48(7): 199-205,213.
[13]	奚智雯, 蔡晶晶, 阳文敏, 柴志雷. 基于微服务架构FPGA云平台的并发请求调度机制[J]. 计算机工程, 2022, 48(7): 206-213.
[14]	臧迪, 杨志刚, 王晶, 姚治成, 张伟功. 基于网卡虚拟化的高性能容器网络设计[J]. 计算机工程, 2022, 48(7): 214-219.
[15]	魏紫薇, 屈丹, 柳聪. 基于连接注意力的行人重识别特征提取方法[J]. 计算机工程, 2022, 48(7): 220-226.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于云端可视化交互的强化学习平台

Reinforcement Learning Platform Based on Cloud Visual Interaction

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价