基于混合主题模型的文本蕴涵识别

doi:10.3969/j.issn.1000-3428.2015.05.033

计算机工程

基于混合主题模型的文本蕴涵识别

盛雅琦,张　晗,吕　晨,姬东鸿

(武汉大学计算机学院,武汉430072)

收稿日期:2014-06-09 出版日期:2015-05-15 发布日期:2015-05-15
作者简介:盛雅琦(1991 - ),女,硕士研究生,主研方向:自然语言处理,文本蕴涵;张　晗,硕士研究生、CCF 会员;吕　晨,博士研究生; 姬东鸿,教授、博士、博士生导师。
基金资助:
国家自然科学基金资助面上项目“汉语文本推理的资源建设和统计分析研究”(61173062)。

Textual Entailment Recognition Based on Mixed Topic Model

SHENG Yaqi,ZHANG Han,LV Chen,JI Donghong

(School of Computer,Wuhan University,Wuhan 430072,China)

Received:2014-06-09 Online:2015-05-15 Published:2015-05-15

摘要/Abstract

摘要： 分析识别文本蕴涵的主流方法,并基于文本T 和假设H 可以从潜在混合主题中生成的猜想,提出一个混合主题模型来识别文本蕴涵,描述一个在混合主题模型上生成文本的概率模型。该模型把文本T 和假设H 看成是同一语义的不同表达,表示为多模式的数据,若文本T 和假设H 有蕴涵关系,则它们有相似的主题分布, 共享混合词汇表和主题。设计mixLDA 和LDA 模型的对比实验,并对RTE-8 任务进行测试,通过支持向量机对得到的句子相似度和其他词法句法特征进行分类。实验结果表明,基于混合主题模型的文本蕴涵识别具有较高的准确率。

关键词: 文本蕴涵, 主题模型, 多模式, 混合主题, 隐藏语义, 支持向量机

Abstract: This paper analyses the main method of recognizing textual entailment,and proposes a method named mixed topic model to recognize textual entailment,and describes a probabilistic model based on the assumption. Texts are generated by mixtures of latent topics. It takes the T(Text) and H(Hypothesis) as a different expression of the same semantic mean. These can be represented as multi mode data. If text entails hypothesis,they have the similar probability distribution of the topic,shares the same mixed bag of words and topics. The model is used in the task RTE-8,parallel tests of mixLDA and LDA models are designed,and a system experiment uses the Support Vector Machine(SVM) to classify the features which consist of the textual similarity made by this model and other features. Experimental result demonstrates the high accuracy of the mixed topic model to recognize textual entailment.

Key words: textual entailment, topic model, multi mode, mixed topic, latent semantic, Support Vector Machine(SVM)

中图分类号:

TP391.1

盛雅琦,张晗,吕晨,姬东鸿. 基于混合主题模型的文本蕴涵识别[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2015.05.033.

SHENG Yaqi,ZHANG Han,LV Chen,JI Donghong. Textual Entailment Recognition Based on Mixed Topic Model[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2015.05.033.

http://www.ecice06.com/CN/Y2015/V41/I5/180

参考文献

参考文献 [ 1 ]　Dagan I, Glickman O, Magnini B. The PASCAL Recognising Textual Entailment Challenge [ C ] / / Proceedings of the 1st PASCAL Machine Learning Challenges Workshop. Berlin,Germany:Springer,2006: 177-190. [ 2 ]　袁毓林,王明华. 文本蕴涵的推理模型与识别模型[J]. 中文信息学报,2010,24(2):3-13. [ 3 ]　张　鹏,李国臣,李　茹,等. 基于FrameNet 框架关系的文本蕴含识别[J]. 中文信息学报,2012,26(2): 46-50. [ 4 ]　de Marneffe M C,Rafferty A N,Manning C D. Finding Contradictions in Text [ C ] / / Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus, USA: Association for Computational Linguistics,2008:1039-1047. [ 5 ]　Malakasiotis P, Androutsopoulos I. Learning Textual Entailment Using SVMs and String Similarity Measures[C] / / Proceedings of Workshop on Textual Entailment and Paraphrasing. Stroudsburg, USA: Association for Computational Linguistics,2007:42-47. [ 6 ]　刘茂福,李　妍,姬东鸿. 基于事件语义特征的中文文本蕴含识别[J]. 中文信息学报,2013,27(5):129-136. [ 7 ]　石　晶,戴国忠. 基于知网的文本推理[J]. 中文信息学报,2006,20(1):76-84. [ 8 ]　Kouylekov M, Magnini B. Recognizing Textual Entailment with Tree Edit Distance Algorithms [C] / / Proceedings of the 1st Challenge Workshop on Recognizing Textual Entailment. Washington D. C. , USA:IEEE Press,2005:17-20. [ 9 ]　Kouylekov M, Negri M. An Open-source Package for Recognizing Textual Entailment [ C ] / / Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: [ s. n. ], 2010:42-47. [10]　Lin Dekang,Pantel P. Discovery of Inference Rules for Question-answering[J]. Natural Language Engineering, 2001,7(4):343-360. [11]　Berant J, Dagan I, Goldberger J. Global Learning of Typed Entailment Rules [C] / / Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: Association for Computational Linguistics,2011:610-619. [12]　Melamud O,Berant J, Dagan I, et al. A Two Level Model for Context Sensitive Inference Rules [ C ] / / Proceedings of the 19th International Conference on Technologies and Applications of Artificial Intelligence. Berlin,Germany:Springer,2014:310-321. [13]　Feng Y,Lapata M. Topic Models for Image Annotation and Text Illustration[C] / / Proceedings of Conference of the North American Chapter of the Association of Computational Linguistics. Berlin, Germany: Springer, 2010:831-839. [14]　Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3(4/ 5):993-1022. [15]　Griffiths T. Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation [ D ]. Standford, USA: Standford University,2002. [16]　Boyd-Graber J,Blei D M. Syntactic Topic Models[C] / / Proceedings of Conference on Neural Information Processing Systems. Princeton, USA: Princeton University,2008:185-192. [17]　Petterson J, Smola A J, Caetano T S, et al. Word Features for Latent Dirichlet Allocation [ C ] / / Proceedings of the 24th Annual Conference on Neural Information Processing Systems. Washington D. C. , USA:IEEE Press,2010:1921-1929. 编辑　顾逸斐

[1]	王志江, 秦品乐, 柴锐, 武峰, 程一彤, 史玥. 基于深度学习的牙齿嵌塞自动判别方法[J]. 计算机工程, 2022, 48(4): 307-313.
[2]	雷恒林, 古兰拜尔·吐尔洪, 买日旦·吾守尔, 曾琪. 基于Hellinger距离与词向量的终身机器学习主题模型[J]. 计算机工程, 2022, 48(11): 89-95.
[3]	许伟佳, 秦永彬, 黄瑞章, 陈艳平. 基于DMA与特征划分的多源文本主题模型[J]. 计算机工程, 2021, 47(7): 59-66.
[4]	王海, 翁晨傲, 李克, 骆曦. 一种面向基站扇区方向角估计的改进SVM算法[J]. 计算机工程, 2021, 47(4): 120-126.
[5]	张冰玉, 潘晴, 田妮莉, Everett Xiaolin Wang. 一种基于多重特征融合的信源个数估计方法[J]. 计算机工程, 2021, 47(4): 115-119,126.
[6]	袁自勇, 高曙, 曹姣, 陈良臣. 基于异构图卷积网络的小样本短文本分类方法[J]. 计算机工程, 2021, 47(12): 87-94.
[7]	陈文杰. 一种融合主题特征的自适应知识表示方法[J]. 计算机工程, 2021, 47(1): 87-93,100.
[8]	连晓伟, 马垚, 陈永乐, 张壮壮, 王建华. 基于载荷特征与统计特征的Shodan流量识别[J]. 计算机工程, 2021, 47(1): 117-122.
[9]	袁哲明, 杨晶晶, 陈渊. 基于最大信息系数与冗余分摊的特征选择方法[J]. 计算机工程, 2020, 46(8): 101-105.
[10]	付子爔, 徐洋, 吴招娣, 许丹丹, 谢晓尧. 基于增量学习的SVM-KNN网络入侵检测方法[J]. 计算机工程, 2020, 46(4): 115-122.
[11]	张瑞, 陈红卫. 基于特征优化与SVPSO的工控入侵检测[J]. 计算机工程, 2020, 46(4): 19-25.
[12]	高茂庭, 王吉. 融合社交关系与时间因素的主题模型推荐算法[J]. 计算机工程, 2020, 46(3): 66-72.
[13]	鲁淑霞, 蔡莲香, 张罗幻. 基于动量加速零阶减小方差的鲁棒支持向量机[J]. 计算机工程, 2020, 46(12): 88-95,104.
[14]	覃婷婷, 刘峥, 陈可佳. 结合主题词嵌入和注意力机制的主题模型[J]. 计算机工程, 2020, 46(11): 104-108.
[15]	张波, 周从华, 张付全, 张婷, 蒋跃明. 一种面向SNP选择的模糊聚类算法[J]. 计算机工程, 2019, 45(8): 66-74.

选择文件类型/文献管理软件名称

选择包含的内容

基于混合主题模型的文本蕴涵识别

Textual Entailment Recognition Based on Mixed Topic Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于混合主题模型的文本蕴涵识别

Textual Entailment Recognition Based on Mixed Topic Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价