作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (1): 45-51. doi: 10.19678/j.issn.1000-3428.0053562

• 人工智能与模式识别 • 上一篇    下一篇

融入结构化信息的端到端中文指代消解

付健, 孔芳   

  1. 苏州大学 计算机科学与技术学院, 江苏 苏州 251006
  • 收稿日期:2019-01-03 修回日期:2019-03-04 出版日期:2020-01-15 发布日期:2019-03-14
  • 作者简介:付健(1994-),男,硕士研究生,主研方向为自然语言处理、指代消解技术;孔芳,教授。
  • 基金资助:
    国家自然科学基金(61876118);国家自然科学基金人工智能基础研究应急管理项目(61751206);国家重点研发计划子课题(2017YFB1002101)。

End to End Chinese Coreference Resolution with Structural Information

FU Jian, KONG Fang   

  1. School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 251006, China
  • Received:2019-01-03 Revised:2019-03-04 Online:2020-01-15 Published:2019-03-14

摘要: 在LEE等人提出的端到端指代消解模型基础上,考虑中文行文特点,提出一种融合结构化信息的中文指代消解模型。压缩文档中所进行有句子对应的成分句法树并获取文档压缩树叶节点深度,采用成分句法树的结构化嵌入(SECT)方法将结构信息进行向量化处理,将词性、文档压缩树叶节点深度与SECT信息作为3个特征向量引入模型中进行中文指代消解。在CoNLL2012数据集中的测试结果表明,通过结合上述3个特征,可使该模型的中文指代消解性能得到有效提高,其平均F1值可达62.33%,较基准模型提升5.28%。

关键词: 端到端指代消解, 结构化嵌入, 词性, 成分句法树, 文档句法压缩树

Abstract: On the basis of the end to end coreference resolution model proposed by LEE et al.,this paper further considers the characteristics of Chinese writing and proposes a Chinese coreference resolution model with structural information.The constituency tree of all sentences is compressed to obtain the leaf node depth of the document compression tree.The Structural Embedding of Constituency Tree(SECT) is used to vectorize the structural information.The part of speech,the leaf node depth and the SECT information are introduced into the model as three eigenvectors for Chinese coreference resolution.The test results on the CoNLL2012 dataset show that the application of the three eigenvectors can effectively improve the Chinese coreference resolution of the proposed model,whose average F1 value can reach 62.33%,which is 5.28% higher than the baseline.

Key words: end to end coreference resolution, structural embedding, part of speech, constituency tree, document syntactic compression tree

中图分类号: