融合马尔科夫决策过程与信息熵的对话策略

doi:10.19678/j.issn.1000-3428.0057081

计算机工程 ›› 2021, Vol. 47 ›› Issue (3): 284-290. doi: 10.19678/j.issn.1000-3428.0057081

融合马尔科夫决策过程与信息熵的对话策略

朱映波¹, 赵阳洋², 王佩², 尹凯², 王振宇²

1. 天翼爱音乐文化科技有限公司, 广州 510081;
2. 华南理工大学软件学院, 广州 510006

收稿日期:2019-12-31 修回日期:2020-03-11 发布日期:2020-03-18
作者简介:朱映波(1971-),男,教授级高级工程师、博士,主研方向为数字音乐版权保护、大数据;赵阳洋,博士研究生;王佩,硕士;尹凯,硕士研究生;王振宇,教授、博士。
基金资助:
广东省自然科学基金“面向在线社会网络的信息传播结构分析与宏观预测”（2019A1515011792）；广东省应用型科技研发专项资金重点项目“面向移动互联网用户大数据分析及推荐平台的产业化应用”（2015B010131003）；广州市科技项目“基于大数据分析的移动音乐智能搜索与推荐平台”（201802010025）。

Dialogue Strateqy Integrating Markov Decision Process and Information Entropy

ZHU Yingbo¹, ZHAO Yangyang², WANG Pei², YIN Kai², WANG Zhenyu²

1. iMusic Culture and Technology Co., Ltd., Guangzhou 510081, China;
2. School of Software Engineering, South China University of Technology, Guangzhou 510006, China

Received:2019-12-31 Revised:2020-03-11 Published:2020-03-18

摘要/Abstract

摘要： 对话策略是人机对话系统中的重要组成成分，其性能的优劣直接影响对话系统的性能。在面对完全没有数据的冷启动场景时，收集对话数据进行对话策略学习的过程非常复杂和耗时。为在冷启动场景下能够保持良好性能，提出一种融合马尔科夫决策过程与信息熵的对话算法。利用马尔科夫决策过程快速获得下一步最优对话状态，并结合知识库通过引入属性信息熵方法排除多个状态值函数相同的最优状态，从而获取最优的系统响应动作。在音乐搜索领域数据集上的实验结果表明，与随机策略、基于规则和基于信息熵的算法相比，该算法分别缩短了2.24、0.84和0.03个对话轮次，且能够有效提高对话任务完成率。

关键词: 对话系统, 对话策略, 冷启动, 信息熵, 马尔科夫决策过程

Abstract: Dialogue strategy is an important component in the human-machine dialogue system,and its performance directly affects the performance of the dialogue system.In a cold start scenario without any data,it is complex and time-consuming to collect dialogue data for dialogue strategy learning.In order to maintain good performance in cold start scenarios,this paper proposes a dialogue strategy algorithm that combines Markov Decision Process(MDP) and information entropy.The MDP is used to quickly obtain the next optimal dialogue state.On this basis,the attribute information entropy method is introduced and combined with the knowledge base to exclude the optimal state of multiple functions of the same state value to help the system make the optimal system response action.Experimental results on the data set in the music search field show that compared with random strategy,rule-based and information entropy-based algorithms,the proposed algorithm reduces 2.24,0.84 and 0.03 dialogue rounds respectively,and can effectively improve the completion rate of dialogue tasks.

Key words: dialogue system, dialogue strategy, cold start, information entropy, Markov Decision Process(MDP)

中图分类号:

TP391

朱映波, 赵阳洋, 王佩, 尹凯, 王振宇. 融合马尔科夫决策过程与信息熵的对话策略[J]. 计算机工程, 2021, 47(3): 284-290.

ZHU Yingbo, ZHAO Yangyang, WANG Pei, YIN Kai, WANG Zhenyu. Dialogue Strateqy Integrating Markov Decision Process and Information Entropy[J]. Computer Engineering, 2021, 47(3): 284-290.

https://www.ecice06.com/CN/Y2021/V47/I3/284

图/表 10

20210322174539

20210322174544

20210322174547

20210322174551

20210322174554

20210322174558

20210322174602

20210322174605

20210322174607

20210322174610

参考文献

[1] GORIN A L,RICCARDI G,WRIGHT J H.How may I help you?[J].Speech Communication,1997,23(1/2):113-127.
[2] MCTEAR M F.Spoken dialogue technology:toward the conversational user interface[M].Berlin,Germany:Springer,2004.
[3] JOKINEN K,MCTEAR M F.Spoken dialogue systems[J].Synthesis Lectures on Human Language Technologies,2009,2(1):151-170.
[4] CAO Junkuo,CHEN Guolian.Human-machine dialogue system[M].Beijing:Publishing House of Eletronics Industry,2017.(in Chinese)曹均阔,陈国莲.人机对话系统[M].北京:电子工业出版社,2017.
[5] CHEN Hongsen,LIU Xiaoru,YIN Dawei,et al.A survey on dialogue systems:recent advances and new frontiers[J].ACM SIGKDD Explorations Newsletter,2017,19(2):25-35.
[6] Apple.Apple SIRI[EB/OL].[2019-11-30].https://www.apple.com/ios/siri/.
[7] SERBAN I V,SANKAR C,GERMAIN M,et al.A deep reinforcement learning chatbot[EB/OL].[2019-11-30].https://arxiv.org/abs/1801.06700.pdf.
[8] SENEFF S,HIRSCHMAN L,ZUE V W.Interactive problem solving and dialogue in the ATIS domain[C]//Proceedings of the 3rd DARPA Speech and Natural Language Workshop.[S.1.]:MIT Press,1991:653-668.
[9] WALKER M,ABERDEEN J,BOLAND J,et al.DARPA communicator dialog travel planning systems:the June 2000 data collection[C]//Proceedings of the 7th European Conference on Speech Communication and Technology.Aalborg,Denmark:[s.n.],2001:1371-1374.
[10] ZHAO Yangyang,WANG Zhenyu,WANG Pei,et al.A survey on task-oriented dialogue systems[J].Chinese Journal of Computers,2019,43(10):1862-1896.(in Chinese)赵阳洋,王振宇,王佩,等.任务型对话系统研究综述[J].计算机学报,2019,43(10):1862-1896.
[11] WILLIAMS J D,RAUX A,HENDERSON M.The dialog state tracking challenge series:a review[J].Dialogue and Discourse,2016,7(3):4-33.
[12] ZHONG Keli.Research and implementation of dialogue management system based on POMDP[D].Beijing:Beijing University of Posts and Telecommunications,2014.(in Chinese)钟可立.基于POMDP的对话管理系统研究与实现[D].北京:北京邮电大学,2014.
[13] LEVIN E,PIERACCINI R,ECKERT W.Using Markov decision process for learning dialogue strategies[C]//Proceedings of International IEEE Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,1998:201-204.
[14] ZHAO T,LU A,LEE K,et al.Generative encoder-decoder models for task-oriented spoken dialog systems with chatting capability[C]//Proceedings of SIGDIAL'17.Saarbrucken,Germany:[s.n.],2017:27-36.
[15] VIJAYAKUMAR A K,COGSWELL M,SELVAARJU R R,et al.Diverse beam search:decoding diverse solutions from neural sequence models[EB/OL].[2019-11-30].https://arxiv.org/pdf/1610.02424.pdf.
[16] LUAN Y,BROCKETT C,DOLAN B,et al.Multi-task learning for speaker-role adaptation in neural conversation models[C]//Proceedings of the 8th International Joint Conference on Natural Language Processing.Taipei,China:[s.n.],2017:605-614
[17] ZHAO Yangyang,WANG Zhenyu,YIN Kai,et al.Dynamic reward-based dueling deep dyna-q:robust policy learning in noisy environments[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence.[S.1.]:AAAI Press,2017:487-496.
[18] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[C]//Proceedings of the 26th Workshops on Neural Information Processing Systems.Lake Tahoe,USA:[s.n.],2013:201-220.
[19] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[20] LARDDON S,TRAUM D R.Information state and dialogue management in the TRINDI dialogue move engine toolkit[J].Natural Language Engineering,2000,6(3):323-340.
[21] TRAUM D R,LARSSON S.The information state approach to dialogue management[M].Berlin,Germany:Springer,2003.

选择文件类型/文献管理软件名称

选择包含的内容

融合马尔科夫决策过程与信息熵的对话策略

Dialogue Strateqy Integrating Markov Decision Process and Information Entropy

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陈何雄, 罗宇薇, 韦云凯, 郭威, 杭菲璐, 何映军, 杨宁. 基于联邦学习的SDN异常流量协同检测技术[J]. 计算机工程, 2023, 49(3): 168-176.
[2]	陈天宇, 楚程钱, 万思远, 万永菁, 孙静. 基于条件轻量级神经网络的视频入侵检测算法[J]. 计算机工程, 2023, 49(12): 152-160.
[3]	吴仍裕, 周强, 于海龙, 王亚沙. 基于深度强化学习的深圳市急救车调度算法[J]. 计算机工程, 2022, 48(9): 298-304.
[4]	孙福禄, 王宇嘉, 刘子怡. 基于节点引力与鱼记忆的社区检测算法[J]. 计算机工程, 2022, 48(5): 104-111.
[5]	杨文琦, 章阳, 聂江天, 杨和林, 康嘉文, 熊泽辉. 基于联邦学习的无线网络节点能量与信息管理策略[J]. 计算机工程, 2022, 48(1): 188-196,203.
[6]	王涛, 刘超辉, 郑青青, 黄嘉曦. 基于单向Transformer和孪生网络的多轮任务型对话技术[J]. 计算机工程, 2021, 47(7): 55-58,66.
[7]	廖胜兰, 吉建民, 俞畅, 陈小平. 基于BERT模型与知识蒸馏的意图分类方法[J]. 计算机工程, 2021, 47(5): 73-79.
[8]	周运腾, 张雪英, 李凤莲, 刘书昌, 焦江丽, 田豆. Q-learning算法优化的SVDPP推荐算法[J]. 计算机工程, 2021, 47(2): 46-51.
[9]	于丹宁, 倪坤, 刘云龙. 基于循环卷积神经网络的POMDP值迭代算法[J]. 计算机工程, 2021, 47(2): 90-94,102.
[10]	王旭, 陈永乐, 王庆生, 陈俊杰. 结合特征选择与集成学习的密码体制识别方案[J]. 计算机工程, 2021, 47(1): 139-145,153.
[11]	陈建平, 周鑫, 傅启明, 高振, 付保川, 吴宏杰. 基于二阶时序差分误差的双网络DQN算法[J]. 计算机工程, 2020, 46(5): 78-85,93.
[12]	赵军, 朱荽, 杨雯璟, 许彦辉, 庞宇. 一种基于密度峰值聚类的图像分割算法[J]. 计算机工程, 2020, 46(2): 274-278,285.
[13]	毛明松,张富国. 基于多重图排序的用户冷启动推荐方法[J]. 计算机工程, 2019, 45(5): 175-181.
[14]	王晓雷,陈云杰,王琛,牛犇. 基于Q-learning的虚拟网络功能调度方法[J]. 计算机工程, 2019, 45(2): 64-69.
[15]	薛敏,方勇,黄诚,刘亮. 源代码中的API密钥自动识别方法[J]. 计算机工程, 2018, 44(6): 117-121,129.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

融合马尔科夫决策过程与信息熵的对话策略

Dialogue Strateqy Integrating Markov Decision Process and Information Entropy

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献

相关文章 15

编辑推荐

Metrics

本文评价