作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 114-119. doi: 10.19678/j.issn.1000-3428.0064087

• 人工智能与模式识别 • 上一篇    下一篇

基于多任务学习的多模态命名实体识别方法

李晓腾1, 张盼盼1, 勾智楠2, 高凯1   

  1. 1. 河北科技大学 信息科学与工程学院, 石家庄 050018;
    2. 河北经贸大学 信息技术学院, 石家庄 050061
  • 收稿日期:2022-03-03 修回日期:2022-05-22 发布日期:2023-04-07
  • 作者简介:李晓腾(1994-),男,硕士研究生,主研方向为自然语言处理;张盼盼,硕士研究生;勾智楠,讲师、博士;高凯(通信作者),教授。
  • 基金资助:
    河北省自然科学基金面上项目(F2022208006);河北省高等学校科学技术研究项目(QN2020198)。

Multi-Modal Named Entity Recognition Method Based on Multi-Task Learning

LI Xiaoteng1, ZHANG Panpan1, GOU Zhinan2, GAO Kai1   

  1. 1. School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018, China;
    2. School of Information Technology, Hebei University of Economics and Business, Shijiazhuang 050061, China
  • Received:2022-03-03 Revised:2022-05-22 Published:2023-04-07

摘要: 针对传统多模态命名实体识别方法无法有效融合图文模态信息且不能区分易混淆实体等问题,提出一种基于多任务学习的多模态命名实体识别方法,通过对比融合辅助任务促进图文模态信息的融合,通过实体聚类辅助任务提升模型对易混淆实体的判断能力。利用BERT预训练语言模型和ResNet模型分别对原始文本和图片进行特征映射获得相应的特征向量,并利用跨模态Transformer结构融合图文模态信息。在多模态命名实体识别任务基础上,增加对比融合辅助任务促进图文模态信息融合,增加实体聚类辅助任务学习实体类别之间的差异,提升模型对易混淆实体的区分能力。最后,利用条件随机场层学习上下文转移概率,并输出最优预测结果。实验结果显示,在国际公开数据集Twitter-2017上,所提方法相较于基线方法取得了更高的准确率、召回率和F1值,其中F1值可达85.59%,表明对比融合辅助任务和实体聚类辅助任务能够促进模型对实体的识别效果。

关键词: 命名实体识别, 多任务学习, 多模态信息, 对比学习, 聚类

Abstract: With the aim of overcoming the ineffectiveness of traditional multi-modal Named Entity Recognition (NER)methods in integrating text and image modal information and distinguishing confusable entities, a multi-modal NER method based on multi-task learning is proposed.Here, the fusion of modal information is promoted by a contrast fusion auxiliary task, and the ability to differentiate confusable entities is improved by an entity clustering auxiliary task.First, BERT pre-trained language modal and ResNet model are used to obtain feature vectors, and a cross-modal Transformer is used to fuse text and image modal information.Second, based on the multi-modal NER task, a contrast fusion auxiliary task is added to promote the fusion of image and text modal information.An entity clustering auxiliary task is added to learn the differences between entity categories and improve the ability of the model to distinguish easily confusable entities.Finally, a Conditional Random Field(CRF) layer is used to learn the context transition probability and output optimal prediction results.Experimental results show that, on the international open dataset Twitter-2017, the proposed method achieved higher accuracies, recall rates, and F1-scores than baseline methods, with an F1-score of up to 85.59% being attained.The results show that the added contrast fusion and entity clustering auxiliary tasks improve the recognition effectiveness of the model.

Key words: Named Entity Recognition(NER), multi-task learning, multi-modal information, contrastive learning, clustering

中图分类号: