计算机工程 ›› 2019, Vol. 45 ›› Issue (5): 308-314.doi: 10.19678/j.issn.1000-3428.0052810

• 开发研究与工程应用 • 上一篇    下一篇

基于BiLSTM-CRF的商情实体识别模型

张应成1,杨洋2,蒋瑞3,4,全兵5,张利君3,任晓雷6   

  1. 1.四川大学 计算机学院,成都 610065; 2.四川省计算机研究院,成都 610041; 3.成都瑞贝英特信息技术有限公司,成都 610041; 4.四川智仟科技有限公司,成都 610041; 5.中移(苏州)软件技术有限公司,江苏 苏州 215000; 6.四川黑马数码科技有限公司,四川 泸州 646000
  • 收稿日期:2018-10-08 出版日期:2019-05-15 发布日期:2019-05-15
  • 作者简介:张应成(1994—),男,硕士研究生,主研方向为自然语言处理、人工智能;杨洋、蒋瑞、全兵、张利君,工程师、硕士;任晓雷,工程师。
  • 基金项目:

    四川省科技计划项目(18PTDJ0085,2019YFH0075,2018GZDZX0030);泸州市科技计划项目(2017CDLZ-G25)。

Commercial intelligence entity recognition model based on BiLSTM-CRF

ZHANG Yingcheng1,YANG Yang2,JIANG Rui3,4,QUAN Bing5,ZHANG Lijun3,REN Xiaolei6   

  1. 1.College of Computer Science,Sichuan University,Chengdu 610065,China; 2.Sichuan Institute of Computer Sciences,Chengdu 610041,China; 3.Chengdu Ruibeiyingte Information Technology Co.,Ltd.,Chengdu 610041,China; 4.Sichuan Zhiqian Science and Technology Co.,Ltd.,Chengdu 610041,China; 5.China Mobile(Suzhou) Software Technolgy Co.,Ltd.,Suzhou,Jiangsu 215000,China; 6.Sichuan Heima Digital Technology Co.,Ltd.,Luzhou,Sichuan 646000,China
  • Received:2018-10-08 Online:2019-05-15 Published:2019-05-15

摘要:

结合语言模型条件随机场(CRF)和双向长短时记忆(BiLSTM)网络,构建一种BiLSTM-CRF模型,以提取商情文本序列中的招标人、招标代理以及招标编号3类实体信息。将规范化后的招标文本序列按字进行向量化,利用BiLSTM神经网络获取序列化文本的前向、后向文本特征,并通过CRF提取出双向本文特征中相应的实体。实验结果表明,与传统机器学习算法CRF相比,该模型3类实体的精确率、召回率和F1值平均提升15.21%、12.06%和13.70%。

关键词: 条件随机场, 双向长短时记忆网络, 语言模型, 命名实体识别, 深度学习

Abstract:

A BiLSTM-CRF model is constructed by combining the Conditional Random Field(CRF) model of Bidirectional Long Short-Term Memory(BiLSTM) network to extract three kinds of entity information,tenderer,bidding agent and bidding number,in a commercial text sequence.The normalized bidding text sequence is vectorized by word.The forward and backward text features of the serialized text are obtained by BiLSTM neural network,and the corresponding entities in the two-way text features are extracted by CRF.Experimental results show that compared with the traditional machine learning algorithm CRF,the precision,recall rate and F1 value of the three types of entities in the proposed model are improved by 15.21%,12.06% and 13.70% in average,respectively.

Key words: Conditional Random Field(CRF), Bidirectional Long Short-Term Memory(BiLSTM) network, language model, Named Entity Recognition(NER), deep learning

中图分类号: