计算机工程 ›› 2018, Vol. 44 ›› Issue (6): 305-310.doi: 10.19678/j.issn.1000-3428.0047731

• 开发研究与工程应用 • 上一篇    下一篇

基于堆栈降噪自编码的维吾尔语事件共指关系识别

王淑媛  a,田生伟  a,禹龙  b,冯冠军  c,艾山·吾买尔  d,李圃  e,赵建国  c   

  1. 新疆大学 a.软件学院; b.网络中心; c.人文学院; d.信息科学与工程学院; e.语言学院,乌鲁木齐 830046
  • 收稿日期:2017-06-27 出版日期:2018-06-15 发布日期:2018-06-15
  • 作者简介:王淑媛(1995—),女,硕士研究生,主研方向为自然语言处理;田生伟(通信作者),教授、博士;禹龙,教授;冯冠军、艾山·吾买尔、李圃,副教授、博士;赵建国,副教授、硕士。
  • 基金项目:

    国家自然科学基金(61662074,61563051,61262064);国家自然科学基金重点项目(61331011);新疆自治区科技人才培养项目(QN2016YX0051)。

Identification of Uyghur Event Coreference Relationship Based on Stacked Denoising Autoencoder

WANG Shuyuan  a,TIAN Shengwei  a,YU Long  b,FENG Guanjun  c,AISHAN Wumaier d,LI Pu  e,ZHAO Jianguo  c   

  1. a.School of Software; b.Net Center; c.College of Humanities; d.College of Information Science and Engineering; e.School of Languages; Xinjing University,Urumqi 830046,China
  • Received:2017-06-27 Online:2018-06-15 Published:2018-06-15

摘要:

结合维吾尔语的语言特点,基于堆栈降噪自编码(SDAE),提出一种新的维吾尔语事件共指关系识别方法。将维吾尔语事件两两构成候选事件对,抽取事件基本属性、触发词、事件距离等9项特征,利用Word Embedding富含语义信息的特性,将其计算得到的维吾尔语事件触发词 语义相似度作为特征之一,训练SDAE模型,将SDAE的输出作为softmax层的输入,从而分类完成维吾尔语事件共指关系识别任务。实验结果表明,与浅层机器学习模型支持向量机相比,基于深度学习机制的SDAE模型更适用于维吾尔语事件共指关系识别任务,并提升了识别性能。

关键词: 共指关系, 维吾尔语, 语义相似度, 堆栈降噪自编码, 深度学习

Abstract:

Based on the characteristics of Uyghur language,a method of identifying Uyghur language event coreference relationship based on Stacked Denoising Autoencoder(SDAE) is proposed.This paper divides the Uyghur events to the candidate event pairs,extracted the nine features,basic characteristics of the event,the trigger word and the event distance.At the same time,the word embedding is used to calculate the semantic similarity of Uyghur events trigger words,taking semantic similarity as one of the features.And then training SDAE model,using softmax to complete the identification task of Uyghur language event coreference relationship.Experimental results show that SDAE is more suitable for the identification task than Support Vector Machine(SVM),the shallow machine learning model,and the use of word embedding further enhances the identification performance.

Key words: coreference relationship, Uyghur language, semantic similarity, stacked denoising autoencoder, deep learning

中图分类号: