面向财税领域的实体识别与标注研究

doi:10.19678/j.issn.1000-3428.0054483

计算机工程 ›› 2020, Vol. 46 ›› Issue (5): 312-320. doi: 10.19678/j.issn.1000-3428.0054483

• 开发研究与工程应用 • 上一篇

面向财税领域的实体识别与标注研究

仇瑜^1,2,3, 程力^1,2,3

1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011;
2. 中国科学院大学, 北京 100049;
3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011

收稿日期:2019-04-03 修回日期:2019-05-15 发布日期:2019-05-31
作者简介:仇瑜(1988-),男,博士研究生,主研方向为人工智能、自然语言处理;程力,研究员、博士生导师。
基金资助:
国家"千人计划"项目（Y32H251201）；中国科学院"西部之光"基金（2017-XBZG-BR-001）。

Research on Entity Recognition and Tagging in Fiscal and Taxation Domain

QIU Yu^1,2,3, CHENG Li^1,2,3

1. The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academic of Sciences, Urumqi 830011, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China

Received:2019-04-03 Revised:2019-05-15 Published:2019-05-31

摘要/Abstract

摘要： 特定领域中的实体结构和类别相比通用领域更加复杂多样，传统的命名实体识别方法难以取得理想效果。针对该问题，以财税领域为例，研究领域实体识别与标注问题，实现知识库的动态扩充。根据领域特征定义一组层次实体类别集，使用远程监督的方法获取训练语料。采用基于字、词特征结合的深度神经网络模型识别实体边界，将实体类别标注视为多标签多类别分类任务，并提出一种基于集成学习的方法以进行实体类别标注。在真实数据集上的实验结果表明，相比逻辑回归、支持向量机等方法，该方法的准确率、召回率及F值更高。

关键词: 知识库扩充, 实体识别, 实体标注, 深度学习, 集成学习

Abstract: Traditional recognition methods for named entities do not work well for entities in specific domains,as they usually have more complex structures and types than those in the general domain.To address the problem,this paper takes the fiscal and taxation domain as an entry point to study entity recognition and tagging,so as to implement dynamic expansion of knowledge base.According to the characteristics of the fiscal and taxation domain,a hierarchical entity type set is defined,and a training corpus is obtained by using remote monitoring.Then a deep neural network model based on combined character features and word features is used for entity boundary recognition.Entity type tagging is taken as a multi-label and multi-type classification task,and on this basis a method based on ensemble learning is proposed for entity type tagging.Experimental results on real datasets show that compared with basic methods including logistic regression and support vector machine,the proposed method has higher accuracy,recall and F value.

Key words: knowledge base expansion, entity recognition, entity tagging, deep learning, ensemble learning

中图分类号:

TP391.4

仇瑜, 程力. 面向财税领域的实体识别与标注研究[J]. 计算机工程, 2020, 46(5): 312-320.

QIU Yu, CHENG Li. Research on Entity Recognition and Tagging in Fiscal and Taxation Domain[J]. Computer Engineering, 2020, 46(5): 312-320.

http://www.ecice06.com/CN/Y2020/V46/I5/312

图/表 16

20200513205716

20200513205720

20200513205724

20200513205728

20200513205734

20200513205739

20200513205743

20200513205747

20200513205751

20200513205756

20200513205800

20200513205805

20200513205808

20200513205814

20200513205819

20200513205824

参考文献

[1] FUJITA H,ALI M,SELAMAT A,et al.Trends in applied knowledge-based systems and data science[M] Berlin,Germany:Springer,2016.
[2] HOU Mengwei,WEI Rong,LU Liang,et al.Research review of knowledge graph and its application in medical domain[J].Journal of Computer Research and Development,2018,55(12):2587-2599.(in Chinese)侯梦薇,卫荣,陆亮,等.知识图谱研究综述及其在医疗领域的应用[J].计算机研究与发展,2018,55(12):2587-2599.
[3] NIU F,ZHANG C,RE C,et al.DeepDive:Web-scale knowledge-base construction using statistical learning and inference[J].VLDS,2012,1(12):25-28.
[4] MITCHELL T,COHEN W,HRUSCHKA E,et al.Never-ending learning[J].Communications of the ACM,2018,61(5):103-115.
[5] ABHISHEK A.FgER:fine-grained entity recognition[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.California,USA:AAAI Press,2018:8008-8009.
[6] ZHOU Zhihua.Ensemble methods:foundations and algorithms[M].[S.l.]:CRC Press,2012.
[7] JIA Yidong,XU Weiran,QIN Pengda,et al.Fine-grained entity typing for knowledge base completion[C]//Proceedings of 2016 IEEE International Conference on Network Infrastructure and Digital Content.Washington D.C.,USA:IEEE Press,2016:361-365.
[8] GILLICK D,LAZIC N,GANCHEV K,et al.Context-dependent fine-grained entity type tagging[EB/OL].[2019-03-12].https://arxiv.org/pdf/1412.1820.pdf.
[9] LEE C,HWANG Y G,OH H J,et al.Fine-grained named entity recognition using conditional random fields for question answering[C]//Proceedings of Lecture Notes in Computer Science.Berlin,Germany:Springer,2006:581-587.
[10] LING X,WELD D.Fine-grained entity recognition[C]//Proceeding of the Association for the Advancement of Artificial Intelligence.California,USA:AAAI Press,2012:1-7.
[11] LIU Liu,WANG Dongbo.A review on named entity recognition[J].Journal of the China Society for Scientific and Technical Information,2018,37(3):329-340.(in Chinese)刘浏,王东波.命名实体识别研究综述[J].情报学报,2018,37(3):329-340.
[12] MA X,HOVY E.End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF[EB/OL].[2019-03-12].https://arxiv.org/pdf/1603.01354.pdf.
[13] YOUNG T,HAZARIKA D,PORIA S,et al.Recent trends indeep learning based natural language processing[J].IEEE Computational Intelligence Magazine,2018,13(3):55-75.
[14] YOSEF M A,BAUER S,HOFFART J,et al.Hyena:hierarchical type classification for entity names[C]//Proceedings of the 24th International Conference on Computational Linguistics.Washington D.C.,USA:IEEE Press,2012:1361-1370.
[15] QIU Y,CHENG L,ALGHAZZAWI D.Towards a semi-automatic method for building Chinese tax domain ontology[C]//Proceedings of the 13th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery.Washington D.C.,USA:IEEE Press,2017:2530-2539.
[16] REN Xiang,HE Wenqi,QU Meng,et al.Afet:automatic fine-grained entity typing by hierarchical partial-label embedding[C]//Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing.[S.l.]:Association for Computational Linguistics,2016:1369-1378.
[17] CHIU J P C,NICHOLS E.Named entity recognition with bidirectional LSTM-CNNs[J].Computer Science,2015,11(1):1-14.
[18] Maimaitlayifu,SILAMU Wushouer,MUHETAER Palidan,et al.Uyghur named entity recognition based on BiLSTM-CNN-CRF model[J].Computer Engineering,2018,44(8):230-236.(in Chinese)买买提阿依甫,吾守尔·斯拉木,帕丽旦·木合塔尔,等.基于BiLSTM-CNN-CRF模型的维吾尔文命名实体识别[J].计算机工程,2018,44(8):230-236.
[19] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[20] HUANG Zhiheng,XU Wei,YU Kai.Bidirectional LSTM-CRF models for sequence tagging[J].Computer Science,2015,8:1-10.
[21] SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,56(15):1929-1958.
[22] SILLA C,FREITAS A.A survey of hierarchical classification across different application domains[J].Data Mining & Knowledge Discovery,2011,22(1):31-72.
[23] ZHOU Gang,GUO Fuliang.Research on ensemble learning[J].Computing Technology and Automation,2018,37(4):148-153.(in Chinese)周钢,郭福亮.集成学习方法研究[J].计算技术与自动化,2018,37(4):148-153.
[24] ROKACH L.Ensemble-based classifiers[J].Artificial Intelligence Review,2010,33(1/2):1-39.
[25] YANG Chun,YIN Xucheng,HAO Hongwei,et al.Classifier ensemble with diversity:effectiveness analysis and ensemble optimization[J].Acta Automatica Sinica,2014,40(4):660-674.(in Chinese)杨春,殷绪成,郝红卫,等.基于差异性的分类器集成:有效性分析及优化集成[J].自动化学报,2014,40(4):660-674.
[26] TRIGUERO I,VENS C.Labelling strategies for hierarchical multi-label classification techniques[J].Pattern Recognition,2016,56(8):170-183.
[27] GOYAL A,GUPTA V,KUMAR M.Recent named entity recognition and classification techniques:a systematic review[J].Computer Science Review,2018,29(8):21-43.

选择文件类型/文献管理软件名称

选择包含的内容

面向财税领域的实体识别与标注研究

Research on Entity Recognition and Tagging in Fiscal and Taxation Domain

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[3]	杨长沛, 廖列法. 基于门控空洞卷积特征融合的中文命名实体识别[J]. 计算机工程, 2023, 49(8): 85-95.
[4]	王可铮, 徐玉芬, 周尚波. 结合对比感知损失和融合注意力的图像去雾模型[J]. 计算机工程, 2023, 49(8): 207-214.
[5]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[6]	张家熔, 苑津莎, 许珈宁, 罗志宏. 基于多元信息嵌入与协同神经网络的力学实体识别算法[J]. 计算机工程, 2023, 49(7): 125-134.
[7]	闫兴亚, 匡娅茜, 白光睿, 李月. 基于深度学习的学生课堂行为识别方法[J]. 计算机工程, 2023, 49(7): 251-258.
[8]	李军侠, 王星驰, 殷梓, 石德硕. 边缘深度挖掘的弱监督显著性目标检测[J]. 计算机工程, 2023, 49(7): 169-178.
[9]	吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.
[10]	席建锐, 唐红梅, 梁春阳, 刘鑫. 基于改进隐函数的点云物体重建[J]. 计算机工程, 2023, 49(7): 214-222.
[11]	齐咏生, 杜晓旭, 朱俊峰, 高胜利, 刘利强. 基于增强型轻量深度网络的牧区牲畜高效检测[J]. 计算机工程, 2023, 49(7): 278-287.
[12]	谌雨章, 黄逸姿, 张钧涵. 基于多速率空洞卷积的多尺度水下小目标检测[J]. 计算机工程, 2023, 49(6): 257-264.
[13]	陈明, 刘蓉, 张晔. 基于多重注意力机制的中文医疗实体识别[J]. 计算机工程, 2023, 49(6): 314-320.
[14]	张博旭, 蒲智, 程曦. 基于提示学习的维吾尔语文本分类研究[J]. 计算机工程, 2023, 49(6): 292-299,313.
[15]	于海洋, 景鹏, 张文涛, 谢赛飞, 滑志华, 宋草原. 基于残差与注意力机制的道路裂缝检测U-Net改进模型[J]. 计算机工程, 2023, 49(6): 265-273.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

面向财税领域的实体识别与标注研究

Research on Entity Recognition and Tagging in Fiscal and Taxation Domain

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献

相关文章 15

编辑推荐

Metrics

本文评价