作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 90-97. doi: 10.19678/j.issn.1000-3428.0063841

• 人工智能与模式识别 • 上一篇    下一篇

基于特征融合的无监督跨模态哈希

梁天佑, 孟敏, 武继刚   

  1. 广东工业大学 计算机学院, 广州 510006
  • 收稿日期:2022-01-25 修回日期:2022-03-16 发布日期:2022-07-19
  • 作者简介:梁天佑(1997-),男,硕士研究生,主研方向为跨模态检索;孟敏(通信作者),副教授;武继刚,教授。
  • 基金资助:
    国家自然科学基金(62172109)。

Unsupervised Cross-Modal Hashing Based on Feature Fusion

LIANG Tianyou, MENG Min, WU Jigang   

  1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2022-01-25 Revised:2022-03-16 Published:2022-07-19

摘要: 已有的无监督跨模态哈希(UCMH)方法主要关注构造相似矩阵和约束公共表征空间的结构,忽略了2个重要问题:一是它们为不同模态的数据提取独立的表征用以检索,没有考虑不同模态之间的信息互补;二是预提取特征的结构信息不完全适用于跨模态检索任务,可能会造成一些错误信息的迁移。针对第一个问题,提出一种多模态表征融合结构,通过对不同模态的嵌入特征进行融合,从而有效地综合来自不同模态的信息,提高哈希码的表达能力,同时引入跨模态生成机制,解决检索数据模态缺失的问题;针对第二个问题,提出一种相似矩阵动态调整策略,在训练过程中用学到的模态嵌入自适应地逐步优化相似矩阵,减轻预提取特征对原始数据集的偏见,使其更适应跨模态检索,并有效避免过拟合问题。基于常用数据集Flickr25k和NUS-WIDE进行实验,结果表明,通过该方法构建的模型在Flickr25k数据集上3种哈希位长检索的平均精度均值较DGCPN模型分别提高1.43%、1.82%和1.52%,在NUS-WIDE数据集上分别提高3.72%、3.77%和1.99%,验证了所提方法的有效性。

关键词: 无监督, 跨模态, 检索, 哈希, 深度学习

Abstract: Most of the Unsupervised Cross-Modal Hashing(UCMH) methods focus on the construction of a similarity matrix from the pre-extracted features and the structure control of the common representation space.However, two critical problems need to be addressed.First, the complementarity among different modalities is ignored in most studies.Second, the structural information in the pre-extracted features is partially compatible with the cross-modal retrieval task, which may cause a negative transfer.To address the first problem, this study proposes a multimodal fusion architecture.By fusing the embeddings from different modalities, the information from each modal can be integrated effectively;thus, the expressiveness of the hash codes can be improved.It also proposes a cross-modal generation mechanism to serve as an out-of-sample solution for test query data.For the second problem, this study proposes a dynamic updating strategy of the similarity matrix, which adapts it gradually with the learning embeddings in the training procedure to relieve the Bias in pre-extracted features towards the original dataset and make the similarity matrix suitable for the cross-modal retrieval task.Experiments are conducted on two widely used datasets, Flickr25k and NUS-WIDE.Consequently, the proposed method achieved improvements of 1.43%, 1.82%, and 1.52% in terms of mAP on the Flickr25k dataset with three different hash code lengths, and 3.72%, 3.77%, and 1.99% on NUS-WIDE, which demonstrates the efficacy of the proposed method.

Key words: unsupervised, cross-modal, retrieval, hashing, deep learning

中图分类号: