计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于二阶HMM的中医诊断古文词性标注

刘博,杜建强,聂斌,刘蕾,张鑫,郝竹林   

  1. (江西中医药大学 计算机学院,南昌 330004)
  • 收稿日期:2016-05-24 出版日期:2017-07-15 发布日期:2017-07-15
  • 作者简介:刘博(1990—),男,硕士研究生,主研方向为文本挖掘、医药数据挖掘;杜建强(通信作者),教授、博士;聂斌,讲师、硕士;刘蕾、张鑫,硕士研究生;郝竹林,助教。
  • 基金项目:
    国家自然科学基金(61363042,61562045);江西省高校科技落地计划项目(LD12038);江西省研究生创新基金(YC2015-S350) 。

Part-of-speech Tagging of Traditional Chinese Medicine Diagnosis Ancient Prose Based on Second-order HMM

LIU Bo,DU Jianqiang,NIE Bin,LIU Lei,ZHANG Xin,HAO Zhulin   

  1. (School of Computer Science,Jiangxi University of Traditional Chinese Medicine,Nanchang 330004,China)
  • Received:2016-05-24 Online:2017-07-15 Published:2017-07-15

摘要: 针对传统隐马尔可夫模型(HMM)的词性标注存在捕获上下文信息有限的问题,提出一种改进的二阶隐马尔可夫模型。该模型考虑上下文联系,精确标注中医诊断文本。对训练过程中出现数组下溢的问题,采用生词处理及增加比例因子的方法对其加以修正。实验结果表明,改进后的二阶HMM比传统HMM模型具有更高的词性标注正确率。

关键词: 中医诊断古文, 词性标注, 上下文联系, 比例因子, 二阶隐马尔可夫模型, 生词处理

Abstract: Aiming at the deficiency of traditional Hidden Markov Model(HMM) in solving part-of-speech tagging,this paper proposes an improved second-order HMM.This model can better connect with contextual information,making part-of-speech tagging of the diagnosis of Traditional Chinese Medicine(TCM) accurate.Method of taking scale factor is proposed to solve the array of underflow and the new word processing method is given in the course of training.Experimental results show that,compared with traditional model,the improved second-order HMM has higher accuracy in part-of-speech tagging.

Key words: Traditional Chinese Medicine(TCM) diagnosis ancient prose, part-of-speech tagging, context relations, scale factor, second-order Hidden Markov Model(HMM), new word processing

中图分类号: