Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2010, Vol. 36 ›› Issue (4): 17-19.

• Degree Paper • Previous Articles     Next Articles

Model of Chinese Words Segmentation and Part-of-Word Tagging

LIU Yao-feng, WANG Zhi-liang, WANG Chuan-jing   

  1. (School of Information Engineering, University of Science & Technology Beijing, Beijing 100083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-02-20 Published:2010-02-20

中文分词和词性标注模型

刘遥峰,王志良,王传经   

  1. (北京科技大学信息工程学院,北京 100083)

Abstract: This paper proposes a model of Chinese words segmentation and part-of-word tagging. In the words segmentation stage, the top N segmentation results are confirmed as the candidate. The final result among these candidates is gotten after unknown words recognition and part-of- word tagging. A Chinese lexical analyzer is developed. This model with different size of training set is tested. The lexical analyzer’s accuracy of words segmentation and part-of-word is 98.34% and 96.07%. This proves the effectiveness of the method.

Key words: words segmentation, part-of-word tagging, shortest path

摘要: 构造一种中文分词和词性标注的模型,在分词阶段确定N个最佳结果作为候选集,通过未登录词识别和词性标注,从候选结果集中选优得到最终结果,并基于该模型实现一个中文自动分词和词性自动标注的中文词法分析器。经不同大小训练集下的测试证明,该分析器的分词准确率和词性标注准确率分别达到98.34%和96.07%,证明了该方法的有效性。

关键词: 分词, 词性标注, 最短路径

CLC Number: