作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (4): 309-315. doi: 10.19678/j.issn.1000-3428.0054160

• 开发研究与工程应用 • 上一篇    下一篇

基于强化学习的壮语词性标注

唐素勤a,b, 孙亚茹a, 李志欣a, 张灿龙a   

  1. 广西师范大学 a. 广西多源信息挖掘与安全重点实验室;b. 教育学部 教育技术系, 广西 桂林 541004
  • 收稿日期:2019-03-08 修回日期:2019-05-03 出版日期:2020-04-15 发布日期:2019-05-27
  • 作者简介:唐素勤(1963-),女,教授、博士,主研方向为知识工程、自然语言处理;孙亚茹,硕士研究生;李志欣(通信作者)、张灿龙,教授、博士。
  • 基金资助:
    国家自然科学基金(61967002,61966004,61663004,61662007,61866004);广西自然科学基金(2019GXNSFDA245018,2016GXNS FAA380146,2017GXNSFAA198365);广西科技基地和人才专项(桂科AD16380008)。

Part of Speech Tagging of Zhuang Language Based on Reinforcement Learning

TANG Suqina,b, SUN Yarua, LI Zhixina, ZHANG Canlonga   

  1. a. Guangxi Key Lab of Multi-source Information Mining and Security;b. Department of Educational Technology, Faculty of Education, Guangxi Normal University, Guilin, Guangxi 541004, China
  • Received:2019-03-08 Revised:2019-05-03 Online:2020-04-15 Published:2019-05-27

摘要: 目前壮语智能信息处理研究处于起步阶段,缺乏自动词性标注方法。针对壮语标注语料匮乏、人工标注费时费力而机器标注性能较差的现状,提出一种基于强化学习的壮语词性标注方法。依据壮语的文法特点和中文宾州树库符号构建标注词典,通过依存句法分析融合语义特征,并以长短期记忆网络为策略网络,利用循环记忆完善部分观测信息。在此基础上,引入强化学习框架,将目标词性作为环境反馈,通过特征学习不断逼近目标真实值。实验结果表明,该方法可缓解词性标注模型对训练语料库的依赖,能够快速扩大壮语标注词典的规模,实现壮语词性的自动标注。

关键词: 智能信息处理, 词性标注, 强化学习, 长短期记忆网络, 策略网络

Abstract: Currently,intelligent information processing of the Zhuang language is in its fancy and lacks automatic tagging methods for parts of speech.To address the lack of Zhuang corpus,arduousness of manual tagging,and poor performance of machine tagging,this paper proposes a part of speech tagging method for the Zhuang language based on reinforcement learning.The method builds a tag dictionary according to the grammatical features of Zhuang and Chinese Penzhou Tree Bank(CTB) symbols,and uses dependency syntax analysis to fuse semantic features.Then Long Short-Term Memory(LSTM) network serves as strategic network,using cyclic memory to improve part of observation information.On this basis,a reinforcement learning framework is introduced,and the target part of speech is used as environmental feedback.The true value of the target is gradually approached through feature learning.Experimental results show that this method has excellent performance in part of speech tagging of Zhuang.It can alleviate the dependency of the part of speech tagging model on training corpus,and enlarge the tag dictionary of Zhuang language quickly.Besides,the proposed method can realize part of speech tagging of Zhuang language automatically.

Key words: intelligent information processing, part of speech tagging, reinforcement learning, Long Short-Term Memory(LSTM) network, strategic network

中图分类号: