计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

结合语境与布朗聚类特征的上下位关系验证

张志昌,陈松毅,刘 鑫,马慧芳   

  1. (西北师范大学计算机科学与工程学院,兰州730070)
  • 收稿日期:2014-03-04 出版日期:2015-02-15 发布日期:2015-02-13
  • 作者简介:张志昌(1976 - ),男,副教授、博士,主研方向:自然语言处理,Web 挖掘;陈松毅、刘 鑫,硕士研究生;马慧芳,副教授、博士。
  • 基金项目:
    国家自然科学基金资助项目(61163039,61163036,61363058);西北师范大学青年教师科研能力提升计划基金资助项目 (NWNU-LKQN-10-2)。

Hyponymy Relation Validation Combined with Context and Brown Clustering Feature

ZHANG Zhichang,CHEN Songyi,LIU Xin,MA Huifang   

  1. (School of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2014-03-04 Online:2015-02-15 Published:2015-02-13

摘要: 对海量文本语料进行上下位语义关系自动抽取是自然语言处理的重要内容,利用简单模式匹配方法抽取 得到候选上下位关系后,对其进行验证过滤是难点问题。为此,分别通过对词汇语境相似度与布朗聚类相似度计 算,提出一种结合语境相似度和布朗聚类相似度特征对候选下位词集合进行聚类的上下位关系验证方法。通过对 少量已标注训练语料的语境相似度和布朗聚类相似度进行计算,得到验证模型和2 种相似度的结合权重系数。该 方法无需借助现有的词汇关系词典和知识库,可对上下位关系抽取结果进行有效过滤。在CCF NLP&2012 词汇语 义关系评测语料上进行实验,结果表明,与模式匹配和上下文比较等方法相比,该方法可使F 值指标得到明显提升。

关键词: 上下位关系, 语境相似度, 布朗聚类相似度, 点互信息, 模式匹配, 聚类验证

Abstract: Hyponymy has many important applications in the field of Natural Language Processing (NLP) and the automatic extraction of hyponym relation from massive text datasets is naturally one of important NLP research tasks. The emphasis and difficult point of the research is how to validate a hyponym which is extracted with simple pattern matching method is really correct. By calculating the context feature similarity ( SimCF ) and Brown clustering similarity (SimBrown ), this paper proposes a novel approach of hyponymy validation. It applies a clustering on hyponym candidates,and the clustering similarity feature is obtained by combining SimCF and SimBrown. The combination coefficient of two kinds of similarity is derived based on the SimCFs and SimBrowns between all labeled training words and their hyponyms. The model can filter roughly extraction results without any existed lexical relation dictionary or knowledge base. Evaluation on CCF NLP&CC2012 word semantic relation corpus shows that the proposed approach in this paper significantly improves the F measure value compared with other approaches including pattern matching and simple context comparison.

Key words: hyponymy relation, context similarity, Brown clustering similarity, Point Mutual Information ( PMI ), pattern matching, clustering validation

中图分类号: