计算机工程 ›› 2009, Vol. 35 ›› Issue (21): 197-199.doi: 10.3969/j.issn.1000-3428.2009.21.066

• 人工智能及识别技术 • 上一篇    下一篇

面向层次分类标签的词性标注系统

潘 炜,沈 超   

  1. (复旦大学计算机科学与工程系,上海 200433)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-11-05 发布日期:2009-11-05

POS Tagging System on Hierachical Classification Labels

PAN Wei, SHEN Chao   

  1. (Department of Computer Science and Engineering, Fudan University, Shanghai 200433)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-11-05 Published:2009-11-05

摘要: 网络的发展导致大量不规则短语文本的产生,针对现有词性标注工具在层次分类体系标签语料上性能不佳的问题,提出一种基于最大熵的简单算法。引入一个新标记,结合从Wordnet和维基百科提取的2类特征,使性能得到提升。实验结果表明,系统在DMoz上的准确率达到93.77%。

关键词: 层次分类标签, 词性标注, 最大熵模型

Abstract: The development of Internet gives birth to many unconventional datasets like hierarchical classification labels. Current Part Of Speech(POS) tagging tools fail on these datasets. To solve this problem, this paper proposes a simple method using Max Entropy Model(MEM) framework, which introduces a new tag and two new features extracted from Wordnet and Wikipedia, and comes up with a significant improvement. Experimental results show that the precision reaches 93.77% on DMoz.

Key words: hierarchical classification labels, Part Of Speech(POS) tagging, Max Entropy Model(MEM)

中图分类号: