计算机工程 ›› 2010, Vol. 36 ›› Issue (4): 17-19.doi: 10.3969/j.issn.1000-3428.2010.04.006

• 博士论文 • 上一篇    下一篇

中文分词和词性标注模型

刘遥峰,王志良,王传经   

  1. (北京科技大学信息工程学院,北京 100083)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-02-20 发布日期:2010-02-20

Model of Chinese Words Segmentation and Part-of-Word Tagging

LIU Yao-feng, WANG Zhi-liang, WANG Chuan-jing   

  1. (School of Information Engineering, University of Science & Technology Beijing, Beijing 100083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-02-20 Published:2010-02-20

摘要: 构造一种中文分词和词性标注的模型,在分词阶段确定N个最佳结果作为候选集,通过未登录词识别和词性标注,从候选结果集中选优得到最终结果,并基于该模型实现一个中文自动分词和词性自动标注的中文词法分析器。经不同大小训练集下的测试证明,该分析器的分词准确率和词性标注准确率分别达到98.34%和96.07%,证明了该方法的有效性。

关键词: 分词, 词性标注, 最短路径

Abstract: This paper proposes a model of Chinese words segmentation and part-of-word tagging. In the words segmentation stage, the top N segmentation results are confirmed as the candidate. The final result among these candidates is gotten after unknown words recognition and part-of- word tagging. A Chinese lexical analyzer is developed. This model with different size of training set is tested. The lexical analyzer’s accuracy of words segmentation and part-of-word is 98.34% and 96.07%. This proves the effectiveness of the method.

Key words: words segmentation, part-of-word tagging, shortest path

中图分类号: