[1] ZHAI C X.Interactive information retrieval:models,algorithms,and evaluation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,USA:ACM Press,2020:2444-2447. [2] 张明悦,金芝,赵海燕,等.机器学习赋能的软件自适应性综述[J].软件学报,2020,31(8):2404-2431. ZHANG M Y,JIN Z,ZHAO H Y,et al.Survey of machine learning enabled software self-adaptation[J].Journal of Software,2020,31(8):2404-2431.(in Chinese) [3] 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,41(1):1-27. LIU Q,ZHAI J W,ZHAGN Z C,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27.(in Chinese) [4] 宋健,王子磊.基于值分解的多目标多智能体深度强化学习方法[J].计算机工程,2023,49(1):31-40. SONG J,WANG Z L.Multi-goal multi-agent deep reinforcement learning method based on value decomposition[J].Computer Engineering,2023,49(1):31-40.(in Chinese) [5] 朱斐,吴文,伏玉琛,等.基于双深度网络的安全深度强化学习方法[J].计算机学报,2019,42(8):1812-1826. ZHU F,WU W,FU Y C,et al.A dual deep network based secure deep reinforcement learning method[J].Chinese Journal of Computers,2019,42(8):1812-1826.(in Chinese) [6] 刘成浩,朱斐,刘全.基于优化子目标数的Option-Critic算法[J].计算机学报,2021,44(9):1922-1933. LIU C H,ZHU F,LIU Q.Option-Critic algorithm based on sub-goal quantity optimization[J].Chinese Journal of Computers,2021,44(9):1922-1933.(in Chinese) [7] HA D,SCHMIDHUBER J. World models[EB/OL].[2022-10-25].http://arXiv preprint.arXiv:1803.10122,2018. [8] ZHANG S T,YAO H S,WHITESON S.Breaking the deadly triad with a target network[EB/OL].[2022-10-25].https://arxiv.org/abs/2101.08862v4. [9] ZOU L X,XIA L,DING Z Y,et al.Reinforcement learning to optimize long-term user engagement in recommender systems[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York,USA:ACM Press,2019:2810-2818. [10] IE E,HSU C W,MLADENOV M,et al.RecSim:a configurable simulation platform for recommender systems[EB/OL].[2022-10-25].https://arxiv.org/abs/1909. 04847v2. [11] SHI B,OZSOY M G,HURLEY N,et al.PyrecGym:a reinforcement learning Gym for recommender systems[C]//Proceedings of the 13th Conference on Recommender Systems.New York,USA:ACM Press,2019:491-495. [12] HUANG J,OOSTERHUIS H,DE RIJKE M,et al.Keeping dataset biases out of the simulation:a debiased simulator for reinforcement learning based recommender systems[C]//Proceedings of the 14th Conference on Recommender Systems.New York,USA:ACM Press,2020:190-199. [13] FUJIMOTO,S,MEGER D,PRECUP D.Off-policy deep reinforcement learning without exploration[EB/OL].[2022-10-25].https://arxiv.org/pdf/1812.02900.pdf. [14] ZHANG Y,FENG F L,HE X N,et al.Causal intervention for leveraging popularity bias in recommendation[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,USA:ACM Press,2021:11-20. [15] KINGMA D P,WELLING M.Auto-encoding variational bayes[EB/OL].[2022-10-25].https://arxiv.org/pdf/1312.6114v1.pdf. [16] LOUIZOS C,SHALIT U,MOOIJ J,et al.Causal effect inference with deep latent-variable models[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2017:6449-6459. [17] DULAC-ARNOLD G,EVANS R,HASSELT H V,et al.Deep reinforcement learning in large discrete action spaces[EB/OL].[2022-10-25].https://arxiv.org/pdf/1512.07679.pdf. [18] XIAO T,WANG D L.A general offline reinforcement learning framework for interactive recommendation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.[S.l.]:AAAI Press,2021:4512-4520. [19] 周运腾,张雪英,李凤莲,等.Q-learning算法优化的SVDPP推荐算法[J].计算机工程,2021,47(2):46-51. ZHOU Y T,ZHANG X Y,LI F L,et al.SVDPP recommendation algorithm optimized by Q-learning algorithm[J].Computer Engineering,2021,47(2):46-51.(in Chinese) [20] 金志军,王浩,方宝富.稀疏场景下基于理性好奇心的多智能体强化学习[J].计算机工程,2023,49(5):302-309. JIN Z J,WANG H,FANG B F.Multi-agent reinforcement learning based on rational curiosity in sparse scenarios[J].Computer Engineering,2023,49(5):302-309.(in Chinese) [21] 周瑞朋,秦进.基于最佳子策略记忆的强化探索策略[J].计算机工程,2022,48(2):106-112. ZHOU R P,QIN J.Reinforcement exploration strategy based on best sub-strategy memory[J].Computer Engineering,2022,48(2):106-112.(in Chinese) [22] ZOU L X,XIA L,DU P,et al.Pseudo Dyna-Q:a reinforcement learning framework for interactive recommendation[C]//Proceedings of the 13th International Conference on Web Search and Data Mining.New York,USA:ACM Press,2020:816-824. [23] 梁星星,冯旸赫,黄金才,等.基于自回归预测模型的深度注意力强化学习方法[J].软件学报,2020,31(4):948-966. LIANG X X,FENG Y H,HUANG J C,et al.Novel deep reinforcement learning algorithm based on attention-based value function and autoregressive environment model[J].Journal of Software,2020,31(4):948-966.(in Chinese) [24] 韦炜,全渝娟,卓奕涛,等.基于多阶马尔可夫预测的个性化推荐算法[J].计算机工程,2015,41(11):59-66. WEI W,QUAN Y J,ZHUO Y T,et al.Personalized recommendation algorithm based on multi-order Markov prediction[J].Computer Engineering,2015,41(11):59-66.(in Chinese) [25] CHEN J W,DONG H D,WANG X,et al.Bias and debias in recommender system:a survey and future directions[EB/OL].[2022-10-25].https://arxiv.org/abs/2010.03240v2. [26] MARLIN B M,ZEMEL R S,ROWEIS S,et al.Collaborative filtering and the missing at random assumption[EB/OL].[2022-10-25].https://arxiv.org/ftp/arxiv/papers/1206/1206.5267.pdf. [27] ROHDE D,BONNER S,DUNLOP T,et al.F.RecoGym:a reinforcement learning environment for the problem of product recommendation in online advertising[EB/OL].[2022-10-25].https://arxiv.org/pdf/1808.00720.pdf. [28] RENDLE S,FREUDENTHALER C,GANTNER Z,et al.BPR:Bayesian personalized ranking from implicit feedback[C]//Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence.New York,USA:ACM Press,2009:452-461. [29] ZHAO X Y,ZHANG L,DING Z Y,et al.Recommendations with negative feedback via pairwise deep reinforcement learning[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York,USA:ACM Press,2018:1040-1048. [30] VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[EB/OL].[2022-10-25].https://arxiv.org/pdf/1509.06461.pdf. [31] HO J,ERMON S.Generative adversarial imitation learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2016:4572-4580. [32] SHANG W J,YU Y,LI Q Y,et al.Environment reconstruction with hidden confounders for reinforcement learning based recommendation[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York,USA:ACM Press,2019:566-576. [33] HE X N,DENG K,WANG X,et al.LightGCN:simplifying and powering graph convolution network for recommendation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,USA:ACM Press,2020:639-648. [34] SCHNABEL T,SWAMINATHAN A,SINGH A,et al.Recommendations as treatments:debiasing learning and evaluation[EB/OL].[2022-10-25].https://arxiv.org/pdf/1602.05352.pdf. [35] GUO S Y,ZOU L X,LIU Y D,et al. Enhanced doubly robust learning for debiasing post-click conversion rate estimation[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA:ACM Press,2021:275-284. |