作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (12): 169-171. doi: 10.3969/j.issn.1000-3428.2008.12.060

• 人工智能及识别技术 • 上一篇    下一篇

基于选择倾向性的词汇获取方法

王大亮1,蒋宏潮1,涂序彦1,郑雪峰1,佟子健2   

  1. (1. 北京科技大学信息工程学院,北京 100083;2. 搜狐研发中心,北京 100084)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-06-20 发布日期:2008-06-20

Lexical Acquisition Method Based on Selection Preference

WANG Da-liang1, JIANG Hong-chao1, TU Xu-yan1, ZHENG Xue-feng1, TONG Zi-jian2   

  1. (1. School of Information Engineering, University of Science and Technology, Beijing 100083;2. Department of Research & Development, Sohu.Com Inc., Beijing 100084)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-06-20 Published:2008-06-20

摘要: 通过分析几种统计评价方法发现,互信息法可用于衡量二元独立性,淘汰机会二元组;χ2检验能更合理地评价词汇组合的选择倾向性,获取频繁二元组发现;对数似然比检验可以有效获取稀疏二元组,弥补其他方法无法克服的稀疏数据问题。将互信息、χ2检验、对数似然比检验组合,并加入词汇子范畴框架的启发式规则,提出一个层次分明的综合多种统计评价方法的词汇获取方法。

关键词: 自然语言处理, 词汇获取, 新词发现, 选择倾向性, 统计评价方法

Abstract: This paper analyzes some statistical evaluation methods, and finds that mutual information is able to measure the independency of two meta in order to discard irrelevant ones; χ2-test is more reasonable to evaluate lexical selection preference; log likelihood ratio can obtain spare lexical combination and solve spare data problem, which is a bottleneck to other methods. An approach of Lexical Acquisition is presented, which effectively integrates mutual information, χ2-test and log likelihood ratio with heuristic rules of subcategorization frame.

Key words: nature language processing, lexical acquisition, unknown word detection, selection preference, statistical evaluation method

中图分类号: