作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (13): 128-130. doi: 10.3969/j.issn.1000-3428.2012.13.038

• 人工智能及识别技术 • 上一篇    下一篇

基于含边界词性特征的中文命名实体识别

邱 莎1,3,王付艳1,申浩如1,段 玻1,阿 圆1,丁海燕2   

  1. (1. 昆明学院信息技术学院,昆明 650214; 2. 云南大学信息学院,昆明 650091;3. 复旦大学计算机科学技术学院,上海 201203)
  • 收稿日期:2011-08-23 出版日期:2012-07-05 发布日期:2012-07-05
  • 作者简介:邱 莎(1974-),女,讲师、硕士,主研方向:自然语言处理;王付艳、申浩如、段 玻、阿 圆、丁海燕,讲师、硕士
  • 基金资助:
    昆明学院科研课题基金资助项目(2009G012)

Chinese Named Entity Recognition Based on Part of Speech Feature with Edges

QIU Sha 1,3, WANG Fu-yan 1, SHEN Hao-ru 1, DUAN Bo 1, A Yuan 1, DING Hai-yan 2   

  1. (1. Institute of Information Technology, Kunming University, Kunming 650214, China; 2. Institute of Information, Yunnan University, Kunming 650091, 3. School of Computer Science, Fudan University, Shanghai 201203, China)
  • Received:2011-08-23 Online:2012-07-05 Published:2012-07-05

摘要: 根据词性在任务中可能出现的特征表达,在字粒度一级,基于条件随机场模型,对词性特征在中文命名实体识别任务中的应用进行研究,提出一种将词性和词边界合成为一个特征项的方法。在相同实验环境下针对多种词性特征的应用情况,采用序列标注的方式在公共语料上进行多次中文命名实体识别实验。通过对多次实验结果的比较分析得出,二级词性与词边界合成的特征在系统执行性能和识别效果等方面均为最优。

关键词: 中文命名实体识别, 条件随机场, 特征模板, 词性, 词边界, 标注集

Abstract: According to the possible expressions as the features in the task, the application of Part of Speech(PoS) used in the task of Chinese personal name recognition is discussed based on the Conditional Random Fields(CRFs) on the character level. And the method of combined PoS and word-edge as a feature item is put forward. By sequence labeling on common corpus, multiple experiments of Chinese personal name recognition are token which are done in similar experiment environment with multiple applications of PoS features. Through the results of the experiments, the combination of second level PoS and word-edges is obtained the best effect in the system performance and the recognition of Chinese named entities.

Key words: Chinese named entity recognition, Conditional Random Fields(CRFs), feature template, Part of Speech(PoS), word-edge, label set

中图分类号: