作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

一种面向军事文本的领域特征词向量描述方法

秦杰,曹雷,彭辉,赖俊   

  1. (解放军理工大学 指挥信息系统学院,南京 210007)
  • 收稿日期:2015-06-18 出版日期:2016-08-15 发布日期:2016-08-15
  • 作者简介:秦杰(1990-),男,硕士研究生,主研方向为文本推荐;曹雷(通讯作者),教授;彭辉,讲师、博士;赖俊,讲师、硕士。

A Domain Feature Word Vector Description Method for Military Texts

QIN Jie,CAO Lei,PENG Hui,LAI Jun   

  1. (College of Command Information System,PLA University of Science and Technology,Nanjing 210007,China)
  • Received:2015-06-18 Online:2016-08-15 Published:2016-08-15

摘要:

针对军事文本信息中命名实体多、特征词领域性强的特性,提出一种领域特征词向量描述方法。从优化分词和领域特征词筛选方面压缩向量空间,完善时间、地名、部队名称和武器装备4类重要命名实体的提取规则,扩充分词词典库。改进领域相关度和领域一致度相结合的领域特征词筛选算法,突出领域特征词与常用词汇之间的差别,进一步过滤领域特征词。实验结果表明,优化分词后,该方法能够提取出军事文本中的命名实体和部分专有词汇,降低特征词数量,改进后的领域特征词筛选算法将准确率和召回率分别提高20%和16.7%,提出的领域特征词向量描述方法所生成的特征词向量具有较强的领域性。

关键词: 军事文本, 命名实体, 向量空间, 分词, 领域特征词

Abstract:

According to the large number of named entities and deep domain of feature words in military text information,this paper proposes a vector description method for domain feature words.It compresses the vector space through the optimization of word segmentation and domain feature word selection,improves the extraction rules for four important types of named entity,including time,place name,troop name and weapon equipment,and extends the word segmentation dictionary library.It improves the domain feature word selecting algorithm combining domain relevance and domain consistency,enlarges the difference between domain words and common words,and further filters domain feature words.Experimental results show that after optimizing word segmentation,the named entities and some specific vocabulary in military texts can be extracted,and the number of feature words can be reduced.The accuracy and recall rate of the improved domain feature word selecting method are increased by 20% and 16.7% respectively.The feature word vector generated by the proposed method has strong domain feature.

Key words: military text, named entity, vector space, word segmentation, domain feature word

中图分类号: