基于动态领域图谱与大小模型协同的电力领域术语识别

doi:10.19678/j.issn.1000-3428.0252291

摘要/Abstract

摘要： 以电力领域为例对术语识别任务进行了研究，旨在解决电力行业在数字化转型过程中面临的术语识别挑战。电力行业面临着数据孤岛和知识难以活化利用的问题，要求有更高效的方法将文档中的术语实体转化为可操作的知识以支持决策制定和技术创新。为了应对专业术语难以辨认、新颖术语难以发现等问题，提出了一种基于动态领域图谱与大小模型协同的术语识别方法，从候选术语提取和术语筛选分类两个任务阶段中分别提高术语自动识别的查全率和查准率。首先使用已有术语库构建初代知识图谱，然后查询目标文本相关的节点并结合术语特征进行模型过滤，利用检索增强提示辅助大语言模型提取候选术语，再通过对抗训练获得术语分类的深度学习模型，根据深度学习模型的分类结果迭代动态术语知识图谱。实验结果显示，方法的准确率、召回率和F1值在迭代过程中逐步提升，最终达到了0.8647、0.8565和0.8542，与其他术语识别方法相比，在上述三者指标上均显示出优越性。

Abstract: This study explores automatic term recognition in the electric power domain, addressing challenges faced during its digital transformation, such as data silos and knowledge utilization. To improve the identification of specialized and new terms, a dynamic graph-assisted method combining large and small models is proposed. The approach enhances recall and precision through candidate term extraction and term classification. An initial knowledge graph is built using existing term databases. Target text-related nodes are queried and filtered with term features. A retrieval-augmented large language model extracts candidate terms, followed by adversarial training to develop a deep learning model for term classification. The dynamic term knowledge graph is iteratively updated based on classification results, forming a positive feedback loop. Experimental results show that the method's accuracy, recall, and F1 score improve over iterations, reaching 0.8647, 0.8565, and 0.8542, respectively, demonstrating superior performance compared to other term recognition methods.

冯国平, 陈志坚, 林志煜, 洪亮. 基于动态领域图谱与大小模型协同的电力领域术语识别[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252291.

FENG Guoping, CHEN Zhijian, Lin Zhiyu, HONG Liang. Term Recognition in the Electric Power Domain Based on Dynamic Domain Graphs and Collaborative Large and Small Models[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252291.

参考文献

[1] RUAN J Q, LIANG G Q, ZHAO H, et al. Applying large language models to power systems: potential security threats[J]. IEEE Transactions on Smart Grid, 2024, 15(3): 3333-3336. [2] TOM B B, BENJAMIN M, NICK R, et al. Language models are few-shot learners[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20). Red Hook, NY, USA: Curran Associates Inc, 2020: 1877–1901. [3] QASEMIZADEH, BEHRANG, SIEGFRIED H. Evaluation of technology term recognition with random indexing[C]//Proceedings of the Ninth International Conference on Language Resources and Evaluation. Reykjavik, Iceland: European Language Resources Association, 2014: 4027-4032. [4] AUSTIN M, DELGOSHAEI P, COELHO M E, et al. Architecting smart city digital twins: combined semantic model and machine learning approach[J]. Journal of Management in Engineering, 2020, 36(4). [5] Li Y H, WANG S F, DING H, et al. Large Language Models in Finance: A Survey[C]//Proceedings of the 4th ACM International Conference on AI in Finance. New York, USA: ACM, 2023: 374-382. [6] Li YX, Li Z H, Zhang K, et al. ChatDoctor: a medical chat model fine-tuned on llama model using medical domain knowledge[J]. arXiv preprint arXiv, 2023: 2303.14070. [7] LOSSIOVENTUM J A, JONQUET C, ROCHE M, et a1. Biomedical Terminology Extraction：A New Combination of Statistical and Web Mining Approaches[C]//Proceedings of JADT: Journées d’Analyse statistique des Données Textuelles. Paris, France: HAL, 2014: 421-432. [8] SAHU S, ANAND A. Recurrent Neural Network Models for Disease Name Recognition Using Domain Invariant Features[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016: 2216-2225. [9] 柳大格, 游进国, 耿齐祁. 融合全局与局部语义的跨领域方面词抽取[J]. 计算机工程, 2025, 51(6): 116-126. LIU Dage, YOU Jinguo, GENG Qiqi. Cross-Domain Aspect Term Extraction Fusing Global and Local Semantics[J]. Computer Engineering, 2025, 51(6): 116-126. (in Chinese) [10] MCDONALD R, PEREIRA F. Identifying Gene and Protein Mentions in Text Using Conditional Random Fields[J]. BMC Bioinformatics, 2005, 6(Suppl 1): S6. [11] FRANTZI, K, SOPHIA A, HIDEKI M. Automatic recognition of multi-word terms:. the c-value/nc-value method[J]. International journal on digital libraries 3, 2000: 115-130. [12] YANG Y, WU Z, YANG Y, et al. A Survey of Information Extraction Based on Deep Learning[J]. Applied Sciences. 2022; 12(19):9691. [13] 王昊,邓三鸿,苏新宁,等.基于深度学习的情报学理论及方法术语识别研究[J]. 情报学报,2020,39(8):817-828. WANG H, DENG S H, SU X N, et al. Research on Terminology Recognition of Theories and Methods in Information Science Based on Deep Learning [J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 817-828. (in Chinese) [14] 任义, 苏博, 袁帅. 教育领域下多维度特征命名实体识别方法[J]. 计算机工程, 2024, 50(10): 110-118. REN Yi, SU Bo, YUAN Shuai. Multidimensional Feature Named Entity Recognition Method in Education Domain[J]. Computer Engineering, 2024, 50(10): 110-118. [15] MA X Z, HOVY E. End-to-End Sequence Labeling via Bi directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016: 1064-1074. [16] Han X, Li P, Fu Y, et al. GPT-NER: Named entity recognition via large language models[J]. arXiv preprint arXiv:2306.11644, 2023. [17] Zhang Z, Zhang Z, Zhao H, et al. PromptNER: Prompt-based label-disentangled framework for few-shot named entity recognition[J]. arXiv preprint arXiv:2301.13848, 2023. [18] Chen Y, Zhao J, Xu Q, et al. FsPONER: Few-shot prompt-based NER for operation instructions in the industrial domain[C]//Proceedings of the 32nd International Conference on Computational Linguistics (COLING). 2023: 2142-2153. [19] Wang H, Zhang Y, Liu M, et al. Evaluating the capabilities of large language models for legal named entity recognition[J]. arXiv preprint arXiv:2402.08421, 2024. [20] Yu T, Lin K, Zhang J, et al. FinancialNER: Benchmarking named entity recognition in the financial domain[J]. arXiv preprint arXiv:2402.06678, 2024. [21] Kang S, Kim J, Lee J. TermEval: A benchmark for terminology extraction in scientific and technical domains[J]. arXiv preprint arXiv:2305.03554, 2023. [22] TRAN H T H, GONZÁLEZ-GALLARDO C-E, DELAUNAY J, et al. Is Prompting What Term Extraction Needs?[C]//Proceedings of the Text, Speech, and Dialogue. Cham, Switzerland: Springer, 2024: 17-29 [23] BANERJEE S, CHAKRAVARTHI B, MCCRAE J P, et al. Large Language Models for Few-Shot Automatic Term Extraction[C]//Proceedings of the Natural Language Processing and Information Systems. Cham, Switzerland: Springer, 2024: 137-150. [24] 北京电通电话技术开发有限公司. 术语在线[EB/OL]. [2025-1-16]. https://www.termonline.cn/. BEIJING ELECTRIC TELECOM TECHNOLOGY DEVELOPMENT CO., LTD. Terminology Online [EB/OL]. 2025-1-16. https://www.termonline.cn/. (in Chinese) [25] JIANG Y T, HU R F. Representation Learning and Multi-label Classification of Books Based on BERT[J]. New Century Library, 2020(9): 38-44. [26] FRANTZI K, SOPHIA A, TSUJII J. The C-value/NC-value method of automatic recognition for multi-word terms[C]// Proceedings of the 2nd European Conference. Heidelberg, Berlin: Springer, 1998: 585-604. [27] AHMAD K, GILLAM L, TOSTEVIN L. University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER)[C]//The Eighth Text REtrieval Conference (TREC-8), NIST SP, 2000:717-724. [28] PEAS A, VERDEJO F, GONZALO J. Corpus-Based Terminology Extraction Applied to Information Access[C]// Proceedings of the corpus linguistics. University of Birmingham, England: Corpus Linguistics Conference Series, 2004. [29] 中国科学技术信息研究所. 2024年版中国科技期刊引证报告（核心版）自然科学卷[R]. 北京: 中国科学技术信息研究所, 2024:3-4. CHINA INSTITUTE OF SCIENTIFIC AND TECHNICAL INFORMATION. 2024 Edition of China's Scientific and Technical Journals Citation Report (Core Edition) Natural Sciences Volume[R]. Beijing: China Institute of Scientific and Technical Information, 2024:3-4. (in Chinese) [30] 毛立琦,石拓,吴林,等.基于领域自适应的无监督文本关键词提取模型——以“人工智能风险”领域文本为例[J]. 情报理论与实践, 2022, 45(3): 182-187. MAO L Q, SHI T, WU L, et al. Unsupervised Text Keyword Extraction Model Based on Domain Adaptation—A Case Study of Texts in the Field of "Artificial Intelligence Risk" [J]. Information Studies: Theory & Application, 2022, 45(3): 182-187. (in Chinese) [31] 冉从敬,刘省身,王浩伟,等.Patent-BARTKPG：基于对比学习的中文专利关键技术词生成研究[J].情报学报,2025,44(06):748-760. RAN C J, LIU S S, WANG H W, et al. Patent-BARTKPG: Key Technical Term Generation for Chinese Patents Based on Contrastive Learning [J]. Journal of the China Society for Scientific and Technical Information, 2025, 44(06): 748-760. (in Chinese)

选择文件类型/文献管理软件名称

选择包含的内容