作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (8): 13-21. doi: 10.19678/j.issn.1000-3428.0068508

• 人工智能与模式识别 • 上一篇    下一篇

基于字形特征的血管外科命名实体识别

张华青1, 夏张涛2, 陆晓庆2, 童基均2,*()   

  1. 1. 浙江大学医学院附属第二医院临床医学工程部, 浙江 杭州 310009
    2. 浙江理工大学计算机科学与技术学院, 浙江 杭州 310018
  • 收稿日期:2023-10-07 出版日期:2024-08-15 发布日期:2024-08-09
  • 通讯作者: 童基均
  • 基金资助:
    浙江省自然科学基金(LQ22F010006); 浙江省基础公益研究计划项目(LTGY23H170004)

Named Entity Recognition of Vascular Surgery Based on Glyph Features

Huaqing ZHANG1, Zhangtao XIA2, Xiaoqing LU2, Jijun TONG2,*()   

  1. 1. Department of Clinical Medical Engineering, The Second Affiliated Hospital Zhejiang University School of Medicine, Hangzhou 310009, Zhejiang, China
    2. School of Computer Science and Technology, Zhejiang Sci?Tech University, Hangzhou 310018, Zhejiang, China
  • Received:2023-10-07 Online:2024-08-15 Published:2024-08-09
  • Contact: Jijun TONG

摘要:

电子病历(EMR)作为医疗信息化建设的核心, 蕴含着众多有价值的医疗实体, 对电子病历进行命名实体识别有助于推进医学研究。为解决血管外科电子病历研究数据匮乏、实体复杂识别困难等问题, 基于某三甲医院血管外科的真实临床数据, 构建一个小规模的专科数据集作为实验数据集, 并提出一种基于字形特征的命名实体识别模型。首先, 采用掩码校正的来自Transformer的双向编码器表示(MacBERT)生成动态字向量, 引入汉字四角码与汉字五笔两个维度的字形信息; 然后, 将文本表示传入双向门控循环单元(BiGRU)与门控空洞卷积神经网络(DGCNN)进行特征提取, 并对输出结果进行拼接; 最后, 通过多头自注意力机制捕捉序列内部元素间的关系, 利用条件随机场(CRF)进行标签解码。实验结果表明, 所提模型在自建血管外科数据集上的精确率、召回率、F1值分别为96.45%、97.77%、97.10%, 均优于对比模型, 具有更好的实体识别性能。

关键词: 电子病历, 血管外科, 命名实体识别, 特征融合, 深度学习

Abstract:

As core components of healthcare information systems, Electronic Medical Record (EMR) entails numerous important medical entities. Named Entity Recognition (NER) of EMRs can significantly advance medical research. To address the challenges of limited research data and complex entity recognition in vascular surgery EMRs, a small-scale specialized dataset is constructed using real clinical data obtained from the vascular surgery department of a tertiary hospital. A NER model based on glyph features is proposed to improve the recognition accuracy. First, dynamic character vectors are generated using the Masked Language Model (MLM) as correction Bidirectional Encoder Representations from Transformers (MacBERT) and incorporating glyph information via the Chinese four-corner code and Wubi input methods. The text representations are then fed into a Bi-directional Gated Recurrent Unit (BiGRU) and Gated Dilated Convolutional Neural Network(DGCNN) for feature extraction, and the outputs are subsequently concatenated. Finally, the model employs a multihead self-attention mechanism to capture the relationships between sequence elements and uses Conditional Random Field (CRF) for label decoding. Experimental results demonstrate that the proposed model achieves precision, recall, and F1 scores of 96.45%, 97.77%, and 97.10%, respectively, on the self-constructed vascular surgery dataset. These results indicate that the proposed model outperforms the comparison models and demonstrates superior entity recognition performance.

Key words: Electronic Medical Record(EMR), vascular surgery, Named Entity Recognition(NER), feature fusion, deep learning