作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (3): 79-86. doi: 10.19678/j.issn.1000-3428.0055783

• 人工智能与模式识别 • 上一篇    下一篇

汉语语篇零形式识别与填充方法研究

张月平a, 李茹a,b, 王元龙a, 柴清华c, 武宇娟a, 关勇a   

  1. 山西大学 a. 计算机与信息技术学院;b. 计算机智能与中文信息处理教育部重点实验室;c. 外国语学院, 太原 030006
  • 收稿日期:2019-08-22 修回日期:2019-09-23 发布日期:2019-10-18
  • 作者简介:张月平(1995-),女,硕士研究生,主研方向为中文信息处理;李茹(通信作者),教授;王元龙、柴清华,讲师;武宇娟、关勇,博士研究生。
  • 基金资助:
    国家自然科学基金"面向汉语篇章语义分析的框架推理技术研究"(61772324);国家自然科学基金青年基金"基于事件的图文数据阅读理解关键技术研究"(61806117)。

Research on Null Instantiation Recognition and Filling Method in Chinese Discourses

ZHANG Yuepinga, LI Rua,b, WANG Yuanlonga, CHAI Qinghuac, WU Yujuana, GUAN Yonga   

  1. a. School of Computer and Information Technology;b. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education;c. School of Foreign Languages, Shanxi University, Taiyuan 030006, China
  • Received:2019-08-22 Revised:2019-09-23 Published:2019-10-18

摘要: 零形式识别与填充是在语篇上下文中为句中缺失的语义角色寻找填充项,然而采用分类思想预测集合中正确填充项的方法制约了零形式填充的性能。针对该问题,结合启发式规则与决策树算法识别出需要填充内容的零形式,将上下文中填充过框架元素的内容构成候选语集合,并通过改进的SMOTE算法对少数类样本数据进行扩展,解决了候选语集合数据的非平衡问题。在此基础上,借助汉语框架知识库提取语义相似性特征,利用框架元素间的映射关系提升零形式填充效果。实验结果表明,该方法在数据层面对填充样本的非平衡性进行处理,可使最终的F值提高约12%。

关键词: 汉语框架网, 零形式识别与填充, 非平衡数据, 语义特征, 决策树算法

Abstract: Null Instantiation(NI) recognition and filling is the process of finding fillers for missing semantic roles of a given sentence from the context in discourses,but existing methods that use classification to predict correct fillers in the set undermine the performance of NI filling.To address this problem,this paper proposes a new NI recognition and filling method.It combines heuristic rules and the decision tree algorithm to identify NI of to-be-filled contents,and the contents filled with frame elements in the context are collected to form a candidate set.Then an improved SMOTE algorithm is used to extend the minority sample data to solve data imbalance in the candidate set.On this basis,semantic similarity features are extracted from the knowledge base of Chinese FrameNet(CFN),and the mapping relationships between frame elements are used to improve filling performance.Experimental results show that this method can increase the final F value by about 12% by relieving the imbalance of filling samples at the data level.

Key words: Chinese FrameNet(CFN), Null Instantiation(NI) recognition and filling, unbalanced data, semantic feature, Decision Tree(DT) algorithm

中图分类号: