煤矿井下不安全行为的命名实体识别方法

doi:10.19678/j.issn.1000-3428.0069917

摘要/Abstract

摘要：

为提高井下安全管理效率, 实现煤矿安全生产, 根据煤矿行业相关标准规范, 并结合井下不安全行为领域知识, 采用BIO标注策略构建一个包含8类实体类别、2 359条样本的煤矿井下不安全行为语料库。针对煤矿井下不安全行为命名实体识别任务中存在的语义信息利用不足、实体分布不均衡、实体边界模糊的问题, 提出一种基于Global Pointer和对抗训练的煤矿井下不安全行为命名实体识别模型。首先, 采用改进的分层RoBERTa模型并利用多层语义信息增强井下不安全行为文本向量化, 结合对抗训练对词嵌入层进行扰动, 缓解数据不平衡问题, 增强模型的鲁棒性; 其次, 在特征提取层采用双向门控循环单元(BiGRU)可以更有效地捕获语料的上下文语义特征, 加强文本语义关联; 最后, 在解码层构造Global Pointer, 获得更准确的实体边界识别结果。为验证提出模型的有效性, 在自建的小样本煤矿井下不安全行为数据集上进行实验, 结果表明, 该模型的精确率、召回率和F1值分别为78.77%、78.20%、78.48%, 相比于BERT-Global Pointer模型分别提高了2.27、0.63、1.45百分点, 为构建井下不安全行为知识图谱提供基础。

关键词: 井下不安全行为, 命名实体识别, RoBERTa模型, 对抗训练, Global Pointer模型

Abstract:

A coal mine unsafe behavior corpus containing 8 entity categories and 2 359 samples has been constructed using a BIO labeling strategy to improve the efficiency of underground safety management and realize safe coal mine production, based on the relevant standards and norms of the coal mine industry as well as insights into the field of underground unsafe behavior. Aiming at the problems of insufficient semantic information utilization, unbalanced entity distribution, and fuzzy entity boundary in the named entity recognition task of unsafe behavior in coal mines, this study proposes a named entity recognition model based on Global Pointer and adversarial training. First, the improved hierarchical RoBERTa model is used to make full use of multi-layer semantic information to enhance the text vectorization of underground unsafe behavior, and the word embedding layer is disturbed by adversarial training to alleviate the problem of data imbalance and enhance model robustness. Second, Bidirectional Gated Recurrent Unit (BiGRU) is used in the feature extraction layer to more effectively capture the contextual semantic features of the corpus and enhance the semantic association of the text. Finally, Global Pointer is constructed in the decoding layer to obtain more accurate entity boundary recognition results. The effectiveness of the proposed model is evaluated on a self-built small sample coal mine underground unsafe behavior dataset. The results show that the accuracy, recall, and F1 value of the proposed model are 78.77%, 78.20%, and 78.48%, respectively, which are 2.27, 0.63, and 1.45 percentage points higher than those of the BERT-Global Pointer model. The findings provide a basis for constructing a knowledge graph of unsafe behavior in underground mines.

Key words: unsafe underground behavior, named entity recognition, RoBERTa model, adversarial training, Global Pointer model

付燕, 刘佩怡, 叶鸥. 煤矿井下不安全行为的命名实体识别方法[J]. 计算机工程, 2026, 52(4): 424-432.

FU Yan, LIU Peiyi, YE Ou. Named Entity Recognition Method for Unsafe Underground Behaviors in Coal Mines[J]. Computer Engineering, 2026, 52(4): 424-432.

https://www.ecice06.com/CN/Y2026/V52/I4/424

图/表 9

图1 模型整体结构

Fig.1 Overall structure of the model

图2 分层的RoBERTa结构

Fig.2 Hierarchical RoBERTa structure

图3 BiGRU模型结构

Fig.3 Structure of the BiGRU model

图4 Global Pointer识别实体示意图

Fig.4 Schematic diagram of Global Pointer identifying entity

参考文献 29

1	HANISCH D, FUNDEL K, MEVISSEN H T, et al. ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics, 2005, 6(1): 14. doi: 10.1186/1471-2105-6-14
2	QUIMBAYA A P, MÚNERA A S, RIVERA R A G, et al. Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Computer Science, 2016, 100, 55- 61. doi: 10.1016/j.procs.2016.09.123
3	ZHOU G D, SU J. Named entity recognition using an HMM-based chunk tagger[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Morristown, USA: ACL Press, 2001: 473-480.
4	TSAI R T, HUNG H, SUNG C, et al. On closed task of Chinese word segmentation: an improved CRF model coupled with character clustering and automatically generated template matching[C]//Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Sydney, Australia: [s. n.], 2006: 1-8.
5	ISOZAKI H, KAZAWA H. Efficient support vector classifiers for named entity recognition[C]//Proceedings of the 19th International Conference on Computational Linguistics. Morristown, USA: ACL Press, 2002: 271-278.
6	刘杰. 基于改进的隐马尔科夫模型的中文命名实体识别算法. 太原师范学院学报(自然科学版), 2009, 8(1): 80-83, 90. doi: 10.3969/j.issn.1672-2027.2009.01.025
	LIU J. The arithmetic of Chinese named entity recognition based on the improved nidden Markov model. Journal of Taiyuan Normal University(Natural Science Edition), 2009, 8(1): 80-83, 90. doi: 10.3969/j.issn.1672-2027.2009.01.025
7	胡文博, 都云程, 吕学强, 等. 基于多层条件随机场的中文命名实体识别. 计算机工程与应用, 2009, 45(1): 163-165, 227. doi: 10.3778/j.issn.1002-8331.2009.01.051
	HU W B, DU Y C, LÜ X Q, et al. Study on Chinese named entity recognition based on cascaded conditional random fields. Computer Engineering and Applications, 2009, 45(1): 163-165, 227. doi: 10.3778/j.issn.1002-8331.2009.01.051
8	卢青华, 袁丽娜. 基于组合神经网络的软件命名实体识别仿真. 计算机仿真, 2023, 40(1): 489-492, 509. doi: 10.3969/j.issn.1006-9348.2023.01.088
	LU Q H, YUAN L N. Software named entity recognition simulation based on combined neural network. Computer Simulation, 2023, 40(1): 489-492, 509. doi: 10.3969/j.issn.1006-9348.2023.01.088
9	余丹丹, 黄洁, 党同心, 等. 基于ALBERT的中文简历命名实体识别. 计算机工程与设计, 2024, 45(1): 261- 267. doi: 10.16208/j.issn1000-7024.2024.01.033
	YU D D, HUANG J, DANG T X, et al. Recognition of named entity in Chinese resume based on ALBERT. Computer Engineering and Design, 2024, 45(1): 261- 267. doi: 10.16208/j.issn1000-7024.2024.01.033
10	褚天舒, 唐球, 梁军学, 等. 基于词汇增强和表格填充的中文命名实体识别. 电子技术应用, 2024, 50(2): 23- 29. doi: 10.16157/j.issn.0258-7998.233939
	CHU T S, TANG Q, LIANG J X, et al. Chinese named entity recognition based on lexicon enhancement and table filing. Application of Electronic Technique, 2024, 50(2): 23- 29. doi: 10.16157/j.issn.0258-7998.233939
11	崔少国, 陈俊桦, 李晓虹. 融合语义及边界信息的中文电子病历命名实体识别. 电子科技大学学报, 2022, 51(4): 565- 571. doi: 10.12178/1001-0548.2021350
	CUI S G, CHEN J H, LI X H. Named entity recognition for Chinese electronic medical record by fusing semantic and boundary information. Journal of University of Electronic Science and Technology of China, 2022, 51(4): 565- 571. doi: 10.12178/1001-0548.2021350
12	林娜, 岳希, 唐聃. 基于数据增强和损失平衡的机电领域命名实体识别. 计算机工程与应用, 2025, 61(7): 222- 232. doi: 10.3778/j.issn.1002-8331.2311-0310
	LIN N, YUE X, TANG D. Named entity recognition in electromechanical field based on data enhancement and loss balancing. Computer Engineering and Applications, 2025, 61(7): 222- 232. doi: 10.3778/j.issn.1002-8331.2311-0310
13	曹现刚, 吴可昕, 张梦园, 等. 基于BERT的煤矿装备维护知识命名实体识别研究. 机床与液压, 2023, 51(9): 103- 108. doi: 10.3969/j.issn.1001-3881.2023.09.017
	CAO X G, WU K X, ZHANG M Y, et al. Coal mine equipment maintenance knowledge named entity recognition model based on BERT. Machine Tool & Hydraulics, 2023, 51(9): 103- 108. doi: 10.3969/j.issn.1001-3881.2023.09.017
14	王向前, 李敏敏, 孟祥瑞. 基于ALBERT-BiLSTM-CRF的煤矿事故案例文本命名实体识别方法. 阜阳师范大学学报(自然科学版), 2022, 39(3): 56- 64. doi: 10.14096/j.cnki.cn34-1069/n/2096-9341(2022)03-0056-09
	WANG X Q, LI M M, MENG X R. Named entity recognition method of coal mine accident case text based on ALBERT-BiLSTM-CRF. Journal of Fuyang Normal University(Natural Science), 2022, 39(3): 56- 64. doi: 10.14096/j.cnki.cn34-1069/n/2096-9341(2022)03-0056-09
15	刘飞翔, 李泽荃, 赵嘉良, 等. 基于ERNIE-BiGRU-CRF模型的煤矿安全隐患命名实体智能识别研究. 煤炭工程, 2024, 56(2): 206- 212. doi: 10.11799/ce202402030
	LIU F X, LI Z Q, ZHAO J L, et al. Intelligent recognition of named entities of coal mine safety hidden danger based on ERNIE-BiGRU-CRF model. Coal Engineering, 2024, 56(2): 206- 212. doi: 10.11799/ce202402030
16	付燕, 刘致豪, 叶鸥. 基于煤矿井下不安全行为知识图谱构建方法. 工矿自动化, 2024, 50(1): 88- 95. doi: 10.13272/j.issn.1671-251x.2023060014
	FU Y, LIU Z H, YE O. A method for constructing a knowledge graph of unsafe behaviors in coal mines. Journal of Mine Automation, 2024, 50(1): 88- 95. doi: 10.13272/j.issn.1671-251x.2023060014
17	黄辉, 张雪. 煤矿员工不安全行为研究综述. 煤炭工程, 2018, 50(6): 123- 127. doi: 10.11799/ce201806035
	HUANG H, ZHANG X. Review of research on unsafe behavior of miners. Coal Engineering, 2018, 50(6): 123- 127. doi: 10.11799/ce201806035
18	李红霞, 樊欣怡. 人因视角下国内煤矿安全领域研究现状与发展趋势. 煤炭工程, 2022, 54(1): 181- 186. doi: 10.11799/ce202201033
	LI H X, FAN X Y. Status and development trend of coal mine safety research from the perspective of human factors. Coal Engineering, 2022, 54(1): 181- 186. doi: 10.11799/ce202201033
19	隗昊, 刁宏悦, 孔亮宸, 等. 东北亚舆情文本细粒度命名实体识别方法研究. 计算机工程, 2024, 50(5): 354- 362. doi: 10.19678/j.issn.1000-3428.0068955
	WEI H, DIAO H Y, KONG L C, et al. Research on fine-grained named-entity-recognition method for public-opinion texts in Northeast Asia. Computer Engineering, 2024, 50(5): 354- 362. doi: 10.19678/j.issn.1000-3428.0068955
20	LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2024-04-20]. https://arxiv.org/pdf/1907.11692.
21	JAWAHAR G, SAGOT B, SEDDAH D. What does BERT learn about the structure of language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA, ACL Press, 2019: 232-241.
22	GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]. [2024-04-20]. https://arxiv.org/pdf/1412.6572.
23	CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. [2024-04-20]. https://arxiv.org/pdf/1406.1078.
24	SU J L, MURTADHA A, PAN S F, et al. Global pointer: novel efficient span-based approach for named entity recognition[EB/OL]. [2024-04-20]. http://arxiv.org/abs/2208.03054.
25	HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL]. [2024-04-20]. https://arxiv.org/abs/1508.01991.
26	关斯琪, 董婷婷, 万子敬, 等. 基于BERT-CRF模型的火灾事故案例实体识别研究. 消防科学与技术, 2023, 42(11): 1529- 1534. doi: 10.3969/j.issn.1009-0029.2023.11.014
	GUAN S Q, DONG T T, WAN Z J, et al. Fire accident case named entity recognition based on BERT-CRF model. Fire Science and Technology, 2023, 42(11): 1529- 1534. doi: 10.3969/j.issn.1009-0029.2023.11.014
27	谢腾, 杨俊安, 刘辉. 基于BERT-BiLSTM-CRF模型的中文实体识别. 计算机系统应用, 2020, 29(7): 48- 55. doi: 10.15888/j.cnki.csa.007525
	XIE T, YANG J A, LIU H. Chinese entity recognition based on BERT-BiLSTM-CRF model. Computer Systems & Applications, 2020, 29(7): 48- 55. doi: 10.15888/j.cnki.csa.007525
28	王权于, 李振华, 涂志鹏, 等. 基于BERT-BiGRU-CRF模型的岩土工程实体识别. 地球科学, 2023, 48(8): 3137- 3150. doi: 10.3799/dqkx.2022.462
	WANG Q Y, LI Z H, TU Z P, et al. Geotechnical named entity recognition based on BERT-BiGRU-CRF Model. Earth Science, 2023, 48(8): 3137- 3150. doi: 10.3799/dqkx.2022.462
29	LOU Q F, WANG S T, CHEN J H, et al. Named entity recognition of traditional Chinese medicine cases based on RoBERTa-BiLSTM-CRF[C]//Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine. Washington D. C., USA: IEEE Press, 2023: 4609-4614.

[1]	汪溢镭, 孙歆, 韩嘉佳, 郭绍华, 胡钺琳, 邹福泰. 暗网高质量威胁情报获取技术与实现[J]. 计算机工程, 2026, 52(3): 211-221.
[2]	李强, 谭兴义, 郑唯, 刘震, 杨文海. 基于对抗训练与对比表示蒸馏的图神经网络推理优化[J]. 计算机工程, 2026, 52(1): 126-135.
[3]	张佳承, 韦锦, 陈义时. 改进YOLOv8的实时轻量化鲁棒绿篱检测算法[J]. 计算机工程, 2025, 51(7): 362-374.
[4]	杨竣辉, 李苏晋. 融合位置和实体类别信息的中文命名实体识别[J]. 计算机工程, 2025, 51(3): 113-121.
[5]	郭桦宜, 游进国, 耿齐祁, 陶静梅, 易健宏. 面向铜基复合材料文献的复杂实体关系抽取方法[J]. 计算机工程, 2025, 51(11): 100-111.
[6]	林烁彬, 蔡捷仪, 方晓城, 张正, 卢光明, 陈炳志. 基于强度相关正则化学习的对抗鲁棒蒸馏方法[J]. 计算机工程, 2025, 51(1): 42-50.
[7]	王言国, 吕鹏远, 兰金江, 刘明哲, 秦冠军, 张硕桦, 周宇. 基于对抗训练与Transformer的风力发电机故障分类方法[J]. 计算机工程, 2024, 50(9): 377-384.
[8]	党小超, 刘涧, 董晓辉, 祝忠彦, 李芬芳. 面向不平衡数据的机械设备故障命名实体识别[J]. 计算机工程, 2024, 50(9): 104-112.
[9]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[10]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[11]	张华青, 夏张涛, 陆晓庆, 童基均. 基于字形特征的血管外科命名实体识别[J]. 计算机工程, 2024, 50(8): 13-21.
[12]	魏琢艺, 罗迈, 李文兵, 曾远松, 余伟江, 杨跃东. 基于多源域适应的单细胞智能分类[J]. 计算机工程, 2024, 50(6): 48-55.
[13]	周昭辰, 方清茂, 吴晓红, 胡平, 何小海. 基于MacBERT与对抗训练的机器阅读理解模型[J]. 计算机工程, 2024, 50(5): 41-50.
[14]	隗昊, 刁宏悦, 孔亮宸, 邓耀臣. 东北亚舆情文本细粒度命名实体识别方法研究[J]. 计算机工程, 2024, 50(5): 354-362.
[15]	王明虎, 石智奎, 苏佳, 张新生. 基于RoBERTa和图增强Transformer的序列推荐方法[J]. 计算机工程, 2024, 50(4): 121-131.

选择文件类型/文献管理软件名称

选择包含的内容