Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

Textual Entailment Recognition Fused with Syntactic Structure Transformation and Lexical Semantic Features

ZHANG Zhichang,YAO Dongren,LIU Xia,CHEN Songyi,LU Xiaoyong   

  1. (College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2014-11-19 Online:2015-09-15 Published:2015-09-15

融合句法结构变换与词汇语义特征的文本蕴涵识别

张志昌,姚东任,刘霞,陈松毅,鲁小勇   

  1. (西北师范大学计算机科学与工程学院,兰州 730070)
  • 作者简介:张志昌(1976-),男,副教授、博士,主研方向:自然语言处理,数据挖掘;姚东任、刘霞、陈松毅,硕士研究生;鲁小勇,工程师。
  • 基金资助:
    国家自然科学基金资助项目(61163039,61163036,61363058);西北师范大学青年教师科研能力提升计划基金资助项目(NWNU-LKQN-10-2,NWNU-LKQN-12-23)。

Abstract: The traditional textual entailment recognition methods only stay at vocabulary level,not involving the influence of the syntactic and semantic aspects,and reduce the F value of the identification results.In order to solve this problem,a Chinese text recognition method is proposed which is fused with the transformation of syntactic structure and traditional lexical semantic characteristics.This method makes the text preprocessing based on syntax analysis tree transformation,adds the text contains identification features of syntactic analysis into related statistics and lexical semantic characteristics,uses the statistical machine learning methods to make entailment relationship classification of text T and assumptions text H,and gets the final recognition result through the correction processing of semantic rules.Evaluation results with NTCIR RITE3 show that compared with III&CYUT,Yamraj,etc,the method can obtain higher F value.

Key words: Chinese textual entailment, syntactic structure transformation, lexical semantic feature, lexical statistical featur, statistical machine learning

摘要: 传统文本蕴涵识别方法仅停留在词汇级的识别,无法涉及句法、语义等方面,造成识别结果的F值较低。针对该问题,提出一种将句法结构的变换和传统词汇语义特征结合的中文文本蕴涵识别方法。对文本进行基于句法分析树变换的预处理,将句法分析中适用于文本蕴涵识别的特征加入到相关的统计和词汇语义特征中,使用统计机器学习的方法对由文本片段T和假设的文本片段H组成的文本对进行蕴涵关系分类,并经过语义规则的修正处理得到最终的识别结果。在NTCIR RITE3上的评测结果表明,与III&CYUT,Yamraj等相比,该方法能获得较高的F值。

关键词: 中文文本蕴涵, 句法结构变换, 词汇语义特征, 词汇统计特征, 统计机器学习

CLC Number: