作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2013, Vol. 39 ›› Issue (6): 266-271,286. doi: 10.3969/j.issn.1000-3428.2013.06.059

• 人工智能及识别技术 • 上一篇    下一篇

基于HMM的京剧机构命名实体识别算法

乐 娟1,2,赵 玺3   

  1. (1. 北京理工大学计算机学院,北京 100081;2. 北京戏曲艺术职业学院,北京 100068; 3. 北京联合大学师范学院,北京 100011)
  • 收稿日期:2012-05-28 出版日期:2013-06-15 发布日期:2013-06-14
  • 作者简介:乐 娟(1978-),女,高级讲师、硕士研究生,主研方向:信息检索,软件工程;赵 玺,实验师、硕士研究生
  • 基金资助:
    北京市优秀人才培养计划基金资助项目(2012D002002000001);北京市职业院校教师素质提高工程基金资助项目

Algorithm of Beijing Opera Organization Names Entity Recognition Based on HMM

LE Juan 1,2, ZHAO Xi 3   

  1. (1. College of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; 2. Beijing Vocational Institute of Local Opera and Arts, Beijing 100068, China; 3. Teachers’ College, Beijing Union University, Beijing 100011, China)
  • Received:2012-05-28 Online:2013-06-15 Published:2013-06-14

摘要: 针对机构命名实体识别效率低的问题,提出一种基于隐马尔科夫模型(HMM)的京剧机构命名实体识别算法。利用HMM模型标注文本切分结果的词性消除歧义,通过Viterbi算法计算某种分词结果所对应的可能性最大的词性序列。根据定制的名称识别规则,借助机构前缀词库、后缀词库获得机构名称左右边界,通过自动机算法识别语料中的机构命名实体,并将新词加载到分词词典中。针对京剧领域语料进行开放测试验证,结果表明,该算法的识别正确率可达到99%。

关键词: 开放领域, 命名实体识别, 隐马尔科夫模型, Viterbi算法, 规则树

Abstract: Aiming at the inefficiency of organization named entity recognition, this paper proposes an algorithm of Beijing opera organization Named Entity Recognition(NER) based on Hidden Markov Model(HMM). It uses HMM to take part-of-speech tagging and solve the problem of disambiguation of the words. The Viterbi algorithm is used to calculate the maximum probability tagging sequence to the sentence segmentation. It defines the rules to recognize the organization names. The left and right boundary of the organization is identified with the help of organization postfix lexicon. The new names in corpus are recognized by automatic algorithm and be loaded into the dictionary. This paper takes the test in open materials, the result shows the recognition accuracy can achieve 99%.

Key words: open-domain, Named Entity Recognition(NER), Hidden Markov Model(HMM), Viterbi algorithm, rule tree

中图分类号: