摘要: 针对传统词义消歧仅基于上下文语境而导致准确率低的问题,提出一种多策略的无监督自动词义消歧方法。利用从维基百科在线中提炼出的丰富语义知识,线性融合上下文语境、背景知识和语义信息3大特征,根据逻辑回归算法学习各特征的权重,选取最大融合值所对应的候选项作为最优词义。在SENSEVAL数据集上取得了85.50%的平均准确率,验证了该方法的有效性。
关键词:
词义消歧,
维基百科,
知识库,
无监督学习
Abstract: Most traditional Word Sense Disambiguation(WSD) method is just based on contextual information, often results in inaccurate output. A multi-level unsupervised automatic WSD method which works efficiently is promoted. This method utilizes the rich semantic information extracted from online Wikipedia, makes a linear fusion of contextual information, background knowledge and semantic information, uses logistic regression algorithm to learn the weight of each feature, and selects the one with the maximum combined value as correct meaning. Experimental result on SENSEVAL dataset shows an average precision of 85.50%, therefore validates the feasibility and effectiveness of this method.
Key words:
Word Sense Disambiguation(WSD),
Wikipedia,
knowledge base,
unsupervised learning
中图分类号:
史天艺;李明禄. 基于维基百科的自动词义消歧方法[J]. 计算机工程, 2009, 35(18): 62-65.
SHI Tian-yi; LI Ming-lu. Automatic Word Sense Disambiguation Method Based on Wikipedia[J]. Computer Engineering, 2009, 35(18): 62-65.