结合语境与布朗聚类特征的上下位关系验证

doi:10.3969/j.issn.1000-3428.2015.02.028

计算机工程

结合语境与布朗聚类特征的上下位关系验证

张志昌,陈松毅,刘　鑫,马慧芳

(西北师范大学计算机科学与工程学院,兰州730070)

收稿日期:2014-03-04 出版日期:2015-02-15 发布日期:2015-02-13
作者简介:张志昌(1976 - ),男,副教授、博士,主研方向:自然语言处理,Web 挖掘;陈松毅、刘　鑫,硕士研究生;马慧芳,副教授、博士。
基金资助:
国家自然科学基金资助项目(61163039,61163036,61363058);西北师范大学青年教师科研能力提升计划基金资助项目 (NWNU-LKQN-10-2)。

Hyponymy Relation Validation Combined with Context and Brown Clustering Feature

ZHANG Zhichang,CHEN Songyi,LIU Xin,MA Huifang

(School of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)

Received:2014-03-04 Online:2015-02-15 Published:2015-02-13

摘要/Abstract

摘要： 对海量文本语料进行上下位语义关系自动抽取是自然语言处理的重要内容,利用简单模式匹配方法抽取得到候选上下位关系后,对其进行验证过滤是难点问题。为此,分别通过对词汇语境相似度与布朗聚类相似度计算,提出一种结合语境相似度和布朗聚类相似度特征对候选下位词集合进行聚类的上下位关系验证方法。通过对少量已标注训练语料的语境相似度和布朗聚类相似度进行计算,得到验证模型和2 种相似度的结合权重系数。该方法无需借助现有的词汇关系词典和知识库,可对上下位关系抽取结果进行有效过滤。在CCF NLP&2012 词汇语义关系评测语料上进行实验,结果表明,与模式匹配和上下文比较等方法相比,该方法可使F 值指标得到明显提升。

关键词: 上下位关系, 语境相似度, 布朗聚类相似度, 点互信息, 模式匹配, 聚类验证

Abstract: Hyponymy has many important applications in the field of Natural Language Processing (NLP) and the automatic extraction of hyponym relation from massive text datasets is naturally one of important NLP research tasks. The emphasis and difficult point of the research is how to validate a hyponym which is extracted with simple pattern matching method is really correct. By calculating the context feature similarity ( SimCF ) and Brown clustering similarity (SimBrown ), this paper proposes a novel approach of hyponymy validation. It applies a clustering on hyponym candidates,and the clustering similarity feature is obtained by combining SimCF and SimBrown. The combination coefficient of two kinds of similarity is derived based on the SimCFs and SimBrowns between all labeled training words and their hyponyms. The model can filter roughly extraction results without any existed lexical relation dictionary or knowledge base. Evaluation on CCF NLP&CC2012 word semantic relation corpus shows that the proposed approach in this paper significantly improves the F measure value compared with other approaches including pattern matching and simple context comparison.

Key words: hyponymy relation, context similarity, Brown clustering similarity, Point Mutual Information ( PMI ), pattern matching, clustering validation

中图分类号:

TP18

张志昌,陈松毅,刘鑫,马慧芳. 结合语境与布朗聚类特征的上下位关系验证[J]. 计算机工程.

ZHANG Zhichang,CHEN Songyi,LIU Xin,MA Huifang. Hyponymy Relation Validation Combined with Context and Brown Clustering Feature[J]. Computer Engineering.

https://www.ecice06.com/CN/Y2015/V41/I2/145

参考文献

参考文献 [ 1 ]　Hearst M. Automatic Acquisition of Hyponyms from Large Text Corpora[C] / / Proceedings of COLING’92. New York,USA:[s. n. ],1992:539-545. [ 2 ]　Kozareva Z,Riloff E,Hovy E. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs[C] / / Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Columbus, USA: [s. n. ],2008:1048-1056. [ 3 ]　Kozareva Z, Hovy E. A Semi-supervised Method to Learn and Construct Taxonomies Using the Web[C] / / Proceedings of EMNLP ’ 10. Boston, USA: [ s. n. ], 2010:1110-1118. [ 4 ]　Zhang Chunxia, Jiang Peng. Automatic Extraction of Definitions[C] / / Proceedings of ICCSIT’ 09. Beijing, China:[s. n. ],2009:364-368. [ 5 ]　Westerhout E. Definition Extraction Using Linguistic and Structural Features [ C ] / / Proceedings of the 1st Workshop on Definition Extraction. Borovets,Bulgaria: [s. n. ],2009:61-67. [ 6 ]　Akiba T,Sakai T. Japanese Hyponymy Extraction Based on a Term Similarity Graph [ R ]. Tokyo, Japan: IPSJ SIG,Technical Reprot:2011-IFAT-104,2011. [ 7 ]　Miller G A. WordNet:A Lexical Database for English[J]. Communications of the ACM,1995,38(11):39-41. [ 8 ]　Suchanek F M,Kasneci G,Weikum G. Yago:A Large Ontology from Wikipedia and WordNet [ J ]. Web Semantics:Science,Services and Agents on the World Wide Web,2008,6(3):203-217. [ 9 ]　Boella G, di Caro L. Extracting Definitions and Hypernym Relations Relying on Syntactic Dependencies and Support Vector Machines[C] / / Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: [ s. n. ], 2013:532-537. [10]　Zhang Fan, Shi Shuming, Liu Jing, et al. Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining [C] / / Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Portland,USA:[s. n. ],2011,1159-1168. [11]　刘　磊,曹存根,张春霞,等. 概念空间中上下位关系的意义识别研究[J]. 计算机学报,2009,32(8):1-14. [12]　Wang R C, Cohen W W. Automatic Set Instance Extraction Using Web [ C ] / / Proceedings of the 18th International Conference on World Wide Web. Madrid, Spain:[s. n. ],2009:101-110. [13]　Brown P F,Pietra V J D,de Souza P V. Class-based ngram Models of Natural Language [J]. Computational Linguistics,1992,18(4):467-480. [14]　CCF NLP&CC2012 语义关系识别标准评测集[EB/ OL]. [2014-02-14]. http:/ / tcci. ccf. org. cn / conference / 2012. [15]　CCF NLP&CC2012 语义关系评测结果[ EB / OL ]. [2014-02-14 ]. http:/ / tcci. ccf. org. cn / conference / 2012/ dldoc / 2012 语义关系评测结果. pdf. 编辑　金胡考

[1]	刘治国, 宋广跃, 蔡文珠, 刘庆利. 基于TextRank算法的未知网络协议帧定位方法[J]. 计算机工程, 2020, 46(7): 179-184.
[2]	樊子华,常朝稳,韩培胜,潘冬存. 基于Rete算法的攻击图构建方法[J]. 计算机工程, 2018, 44(3): 151-155,165.
[3]	金戈,薛质,齐开悦. 主引导记录型Rootkit建模及其静态检测方法[J]. 计算机工程, 2015, 41(7): 184-189.
[4]	年梅,张兰芳. 维吾尔文网络查询扩展词的构建研究[J]. 计算机工程, 2015, 41(4): 187-189,194.
[5]	吴旭婧,许勇,张亚楠. 基于指纹模式匹配的无线传感器网络密钥预分配方案[J]. 计算机工程, 2015, 41(3): 106-109.
[6]	陈伟,滕宏舜. 基于BM窗口竞争的高效单模式匹配算法[J]. 计算机工程, 2015, 41(12): 144-149.
[7]	刘春晖,黄宇,宋琦. 一种改进的AC多模式匹配算法[J]. 计算机工程, 2015, 41(10): 280-285.
[8]	伊力亚尔·达吾提,哈力旦·阿布都热依木,杨娜娜. 面向维吾尔文的多模式匹配算法研究[J]. 计算机工程, 2015, 41(1): 143-149.
[9]	莫媛媛，郭剑毅，余正涛，蒋年树，线岩团. 基于CCRF的领域本体概念上下位关系抽取[J]. 计算机工程, 2014, 40(6): 138-141.
[10]	王震，李仁发，李彦彪，田峥. 一种并行中英文混合多模式匹配算法[J]. 计算机工程, 2014, 40(4): 318-320.
[11]	许家铭，李晓东，金键，马盈. 一种高效的多模式字符串匹配算法[J]. 计算机工程, 2014, 40(3): 315-320.
[12]	王艳霞，江艳霞，王亚刚，李烨. BMH2C单模匹配算法的研究与改进[J]. 计算机工程, 2014, 40(3): 298-302.
[13]	侯整风,张浩,张娜. 基于字频分布的中文网页编码识别算法[J]. 计算机工程, 2014, 40(12): 199-204.
[14]	宋晖，史南胜. 基于模式匹配与半监督学习的评价对象抽取[J]. 计算机工程, 2013, 39(10): 221-226.
[15]	付思源, 刘功申, 李建华. 基于UEFI固件的恶意代码防范技术研究[J]. 计算机工程, 2012, 38(9): 117-120.

选择文件类型/文献管理软件名称

选择包含的内容

结合语境与布朗聚类特征的上下位关系验证

Hyponymy Relation Validation Combined with Context and Brown Clustering Feature

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

结合语境与布朗聚类特征的上下位关系验证

Hyponymy Relation Validation Combined with Context and Brown Clustering Feature

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价