作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (15): 98-100. doi: 10.3969/j.issn.1000-3428.2007.15.034

• 软件技术与数据库 • 上一篇    下一篇

基于矩阵约束法的中文分词研究

张素智,刘放美   

  1. (郑州轻工业学院计算机与通信工程学院,郑州 450002)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-08-05 发布日期:2007-08-05

Research on Chinese Word Segmentation Based on Matrix Restraint

ZHANG Su-zhi, LIU Fang-mei   

  1. (College of Computer and Communications Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-05 Published:2007-08-05

摘要: 分词识别和歧义消除是影响信息检索系统准确度的重要因素,该文提出了一种基于语法和语义的使用约束矩阵的中文分词算法。该算法建立在语法和句法的基础上,从语境角度分析歧义字段,提高分词准确率。系统可以将输入的连续汉字串进行分词处理,输出分割后的汉语词串,并得到一个词典。再用《现代汉语语法信息词典》进行处理,实验结果显示分词准确率能提高10%左右。

关键词: 中文分词, 矩阵约束, 歧义消除, 分词系统

Abstract: Words segmentation recognition and ambiguity resolving are vital factors for information retrieval precision. This paper presents a Chinese word segmentation algorithm with restraint matrix based on the grammar and the semantic. The algorithm improves the accuracy of word segmentation, by combining morphology and syntax with language situation. Continuous character bunch input can be segmented, and then the cut apart word bunch output can be obtained, and one dictionary can be obtained. Then “modern Chinese grammar information dictionary” is used to carry on the processing. Experimental result show the segment accurate rate can enhance about 10%.

Key words: Chinese word, matrix restraint, disambiguation word, segmentation system

中图分类号: