作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (5): 190-199. doi: 10.19678/j.issn.1000-3428.0067501

• 图形图像处理 • 上一篇    下一篇

深度语义关联学习的基于图像视觉数据跨域检索

焦世超1,2,3, 关日鹏1,2,3, 况立群1,2,3, 熊风光1,2,3, 韩燮1,2,3   

  1. 1. 中北大学计算机科学与技术学院, 山西 太原 030051;
    2. 机器视觉与虚拟现实山西省重点实验室, 山西 太原 030051;
    3. 山西省视觉信息处理及智能机器人工程研究中心, 山西 太原 030051
  • 收稿日期:2023-04-25 修回日期:2023-07-03 发布日期:2023-08-09
  • 通讯作者: 焦世超,E-mail:kuang@nuc.edu.cn E-mail:kuang@nuc.edu.cn
  • 基金资助:
    国家自然科学基金(62272426,62106238);山西省科技重大专项计划"揭榜挂帅"项目(202201150401021);山西省科技成果转化引导专项(202104021301055);山西省回国留学人员科研项目(2020-113);山西省基础研究计划(202203021222027)。

Image-Based Cross-Domain Visual-Data Retrieval with Deep Semantic Correlation Learning

JIAO Shichao1,2,3, GUAN Ripeng1,2,3, KUANG Liqun1,2,3, XIONG Fengguan1,2,3, HAN Xie1,2,3   

  1. 1. School of Computer Science and Technology, North University of China, Taiyuan 030051, Shanxi, China;
    2. Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, Shanxi, China;
    3. Shanxi Province's Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, Shanxi, China
  • Received:2023-04-25 Revised:2023-07-03 Published:2023-08-09
  • Contact: 焦世超,E-mail:kuang@nuc.edu.cn E-mail:kuang@nuc.edu.cn

摘要: 基于图像的视觉数据跨域检索任务旨在搜索与输入图像在语义上一致或外形上相似的跨域图像和三维模型数据,其面临的主要问题是处理跨域数据之间的模态异质性。现有方法通过构建公共特征空间,采用域适应算法或深度度量学习算法实现跨域特征的域对齐或语义对齐,其有效性仅在单一类型的跨域检索任务中进行了验证。提出一种基于深度语义关联学习的方法,以适用多种类型的基于图像的跨域视觉数据检索任务。首先,使用异构网络提取跨域数据的初始视觉特征;然后,通过构建公共特征空间实现初始特征映射,以便进行后续的域对齐和语义对齐;最后,通过域内鉴别性学习、域间一致性学习和跨域相关性学习,消除跨域数据特征之间的异质性,探索跨域数据特征之间的语义相关性,并为检索任务生成鲁棒且统一的特征表示。实验结果表明,该方法在TU-Berlin、IM2MN和MI3DOR数据集中的平均精度均值(mAP)分别达到0.448、0.689和0.874,明显优于对比方法。

关键词: 跨域检索, 特征对齐, 域对齐, 草图, 真实图像, 三维模型, 相关性学习

Abstract: Image-based cross-domain retrieval of visual data is performed to identify cross-domain images and three-dimensional model data that are semantically consistent with or similar in appearance to an input image. In this task, the modal heterogeneity between cross-domain data must be addressed to achieve cross-domain correspondence between the query images and target objects. Existing methods achieve domain or semantic alignment of cross-domain features by constructing a common feature space and using a domain-adaptation or depth metric algorithm. The effectiveness of these methods has only been verified in a single type of cross-domain retrieval task. To address the above issues, a method based on deep semantic correlation learning is proposed for many types of image-based cross-domain visual-data retrieval tasks. First, heterogeneous networks are used to extract the original visual features of cross-domain data. Subsequently, a common feature space is constructed to map the original features for subsequent domain and semantic alignments. Finally, intra-modal discrimination learning, inter-modal consistency learning, and cross-modal correlation learning are performed to eliminate the heterogeneity among cross-domain features, determine the semantic relevance among cross-domain data features, and generate robust and uniform feature representations for retrieval tasks. Experimental results show that the mean Average Precision (mAP) values of this method on the TU-Berlin, IM2MN, and MI3DOR datasets are 0.448, 0.689, and 0.874, respectively, significantly better than comparative methods.

Key words: cross-domain retrieval, feature alignment, domain alignment, sketch, real image, three-dimensional model, correlation learning

中图分类号: