摘要: 以条件随机场(CRF)作为构建词义消歧模型库的概率模型,利用CRF分别训练高频义项和低频义项标点句语料,应用生成的模型文件进行消歧实验。通过分析标注结果中的概率值确定阈值,以区分标注正确项和错误项。使用表现较好的模型文件和相应阈值构建面向词义消歧的条件随机场模型库。实验结果证明,对低频义项建模的词义消歧效果优于对高频义项进行建模,可以达到80%以上的正确率,并且可以获得较高的召回率。
关键词:
多义词,
词义消歧,
条件随机场,
高频义项,
低频义项
Abstract: Taking Condition Random Field(CRF) method as probability model for word sense disambiguation model base, this paper uses CRF to train model file of high-frequency meaning and low-frequency meaning. It analyzes the probabilities in the result, determines a threshold to justify whether the marked tag is right, and uses the best performed model and its corresponding threshold to build CRF model base. Experimental results prove that model files of low-frequency meaning have better performance. The accuracy rate can reach above 80%, and the call rate is high.
Key words:
polysemous word,
word sense disambiguation,
Condition Random Field(CRF),
high-frequency meaning,
low-frequency meaning
中图分类号:
车玲, 张仰森. 面向词义消歧的条件随机场模型库构建[J]. 计算机工程, 2012, 38(20): 152-155.
CHE Ling, ZHANG Ang-Sen. Construction of Condition Random Field Model Base Oriented to Word Sense Disambiguation[J]. Computer Engineering, 2012, 38(20): 152-155.