计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于混合主题模型的文本蕴涵识别

盛雅琦,张 晗,吕 晨,姬东鸿   

  1. (武汉大学计算机学院,武汉430072)
  • 收稿日期:2014-06-09 出版日期:2015-05-15 发布日期:2015-05-15
  • 作者简介:盛雅琦(1991 - ),女,硕士研究生,主研方向:自然语言处理,文本蕴涵;张 晗,硕士研究生、CCF 会员;吕 晨,博士研究生; 姬东鸿,教授、博士、博士生导师。
  • 基金项目:
    国家自然科学基金资助面上项目“汉语文本推理的资源建设和统计分析研究”(61173062)。

Textual Entailment Recognition Based on Mixed Topic Model

SHENG Yaqi,ZHANG Han,LV Chen,JI Donghong   

  1. (School of Computer,Wuhan University,Wuhan 430072,China)
  • Received:2014-06-09 Online:2015-05-15 Published:2015-05-15

摘要: 分析识别文本蕴涵的主流方法,并基于文本T 和假设H 可以从潜在混合主题中生成的猜想,提出一个 混合主题模型来识别文本蕴涵,描述一个在混合主题模型上生成文本的概率模型。该模型把文本T 和假设H 看成是同一语义的不同表达,表示为多模式的数据,若文本T 和假设H 有蕴涵关系,则它们有相似的主题分布, 共享混合词汇表和主题。设计mixLDA 和LDA 模型的对比实验,并对RTE-8 任务进行测试,通过支持向量机对 得到的句子相似度和其他词法句法特征进行分类。实验结果表明,基于混合主题模型的文本蕴涵识别具有较 高的准确率。

关键词: 文本蕴涵, 主题模型, 多模式, 混合主题, 隐藏语义, 支持向量机

Abstract: This paper analyses the main method of recognizing textual entailment,and proposes a method named mixed topic model to recognize textual entailment,and describes a probabilistic model based on the assumption. Texts are generated by mixtures of latent topics. It takes the T(Text) and H(Hypothesis) as a different expression of the same semantic mean. These can be represented as multi mode data. If text entails hypothesis,they have the similar probability distribution of the topic,shares the same mixed bag of words and topics. The model is used in the task RTE-8,parallel tests of mixLDA and LDA models are designed,and a system experiment uses the Support Vector Machine(SVM) to classify the features which consist of the textual similarity made by this model and other features. Experimental result demonstrates the high accuracy of the mixed topic model to recognize textual entailment.

Key words: textual entailment, topic model, multi mode, mixed topic, latent semantic, Support Vector Machine(SVM)

中图分类号: