[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Playing Atari with deep reinforcement learning[EB/OL].[2021-06-10].https://arxiv.org/abs/1312.5602. [2] TAMPUU A, MATIISEN T, KODELJA D, et al.Multiagent cooperation and competition with deep reinforcement learning[J].PLoS One, 2017, 12(4):1-10. [3] 陈奇石.强化学习在仿人机器人行走稳定控制上的研究及实现[D].广州:华南理工大学, 2016. CHEN Q S.Study and implement of reinforcement learning in biped robot balance control[D].Guangzhou:South China University of Technology, 2016.(in Chinese) [4] ZHANG T H, KAHN G, LEVINE S, et al.Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2016:528-535. [5] DUAN Y, CHEN X, HOUTHOOFT R, et al.Benchmarking deep reinforcement learning for continuous control[C]//Proceedings of the 33rd International Conference on Machine Learning.New York, USA:ACM Press, 2016:1329-1338. [6] CAICEDO J C, LAZEBNIK S.Active object localization with deep reinforcement learning[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:2488-2496. [7] HANSEN S.Using deep Q-learning to control optimization hyperparameters[EB/OL].[2021-06-10].https://arxiv.org/abs/1602.04062. [8] RICHARD S.SUTTON, BARTO A G.Reinforcement learning:an introduction[M].Cambridge, USA:MIT Press, 1998. [9] DEGRIS T, WHITE M, SUTTON R S.Off-policy actor-critic[EB/OL].[2021-06-10].https://arxiv.org/pdf/1205. 4839.pdf. [10] ZHANG A, CASARI A.Feature engineering for machine learning[M].[S.l.]:O'Reilly Media, 2018. [11] SCOTT S, MATWIN S.Feature engineering for text classification[C]//Proceedings of the 16th International Conference on Machine Learning.Berlin, Germany:1999:379-388. [12] DEWEY D.Reinforcement learning and the reward engineering principle[C]//Proceedings of 2014 AAAI Spring Symposium.Palo Alto, USA:AAAI Press, 2014:1-10. [13] 王子强, 武继刚.基于RDC-Q学习算法的移动机器人路径规划[J].计算机工程, 2014, 40(6):211-214. WANG Z Q, WU J G.Mobile robot path planning based on RDC-Q learning algorithm[J].Computer Engineering, 2014, 40(6):211-214.(in Chinese) [14] SINGH S, BARTO A G, CHENTANEZ N.Intrinsically motivated reinforcement learning[EB/OL].[2021-06-10].https://www.researchgate.net/profile/Satinder-Singh-3/publication/221619598_Intrinsically_Motivated_Reinforcement_Learning/links/55ad05af08aee079921caa19/Intrinsically-Motivated-Reinforcement-Learning.pdf. [15] SORG J, SINGH S, LEWIS R, et al.Internal rewards mitigate agent boundedness[C]//Proceedings of the 27th International Conference on Machine Learning.New York, USA:ACM Press, 2010:1007-1014. [16] SORG J, SINGH S, LEWIS R.Reward design via online gradient ascent[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2010:2190-2198. [17] 卜令正.基于深度强化学习的机械臂控制研究[D].徐州:中国矿业大学, 2019. BU L Z.Study of robot arm control based on deep reinforcement learning[D].Xuzhou:China University of Mining and Technology, 2019.(in Chinese) [18] NAGPAL R, KRISHNAN A U, YU H S.Reward engineering for object pick and place training[EB/OL].[2020-06-10].https://arxiv.org/abs/2001.03792. [19] 魏娟, 杨恢先, 谢海霞.基于免疫RBF神经网络的逆运动学求解[J].计算机工程, 2010, 36(22):192-194. WEI J, YANG H X, XIE H X.Solution of inverse kinematics based on immune RBF neural network[J].Computer Engineering, 2010, 36(22):192-194.(in Chinese) [20] 郑钧天.基于深度强化学习的机械臂轨迹规划仿真[D].成都:电子科技大学, 2020. ZHENG J T.Simulation for manipulator trajectory planning based on deep reinforcement learning[D].Chengdu:University of Electronic Science and Technology of China, 2020.(in Chinese) [21] 李鹤宇, 赵志龙, 顾蕾, 等.基于深度强化学习的机械臂控制方法[J].系统仿真学报, 2019, 31(11):2452-2457. LI H Y, ZHAO Z L, GU L, et al.Robot arm control method based on deep reinforcement learning[J].Journal of System Simulation, 2019, 31(11):2452-2457.(in Chinese) |