深度语义关联学习的基于图像视觉数据跨域检索

doi:10.19678/j.issn.1000-3428.0067501

摘要/Abstract

摘要： 基于图像的视觉数据跨域检索任务旨在搜索与输入图像在语义上一致或外形上相似的跨域图像和三维模型数据,其面临的主要问题是处理跨域数据之间的模态异质性。现有方法通过构建公共特征空间,采用域适应算法或深度度量学习算法实现跨域特征的域对齐或语义对齐,其有效性仅在单一类型的跨域检索任务中进行了验证。提出一种基于深度语义关联学习的方法,以适用多种类型的基于图像的跨域视觉数据检索任务。首先,使用异构网络提取跨域数据的初始视觉特征;然后,通过构建公共特征空间实现初始特征映射,以便进行后续的域对齐和语义对齐;最后,通过域内鉴别性学习、域间一致性学习和跨域相关性学习,消除跨域数据特征之间的异质性,探索跨域数据特征之间的语义相关性,并为检索任务生成鲁棒且统一的特征表示。实验结果表明,该方法在TU-Berlin、IM2MN和MI3DOR数据集中的平均精度均值(mAP)分别达到0.448、0.689和0.874,明显优于对比方法。

关键词: 跨域检索, 特征对齐, 域对齐, 草图, 真实图像, 三维模型, 相关性学习

Abstract: Image-based cross-domain retrieval of visual data is performed to identify cross-domain images and three-dimensional model data that are semantically consistent with or similar in appearance to an input image. In this task, the modal heterogeneity between cross-domain data must be addressed to achieve cross-domain correspondence between the query images and target objects. Existing methods achieve domain or semantic alignment of cross-domain features by constructing a common feature space and using a domain-adaptation or depth metric algorithm. The effectiveness of these methods has only been verified in a single type of cross-domain retrieval task. To address the above issues, a method based on deep semantic correlation learning is proposed for many types of image-based cross-domain visual-data retrieval tasks. First, heterogeneous networks are used to extract the original visual features of cross-domain data. Subsequently, a common feature space is constructed to map the original features for subsequent domain and semantic alignments. Finally, intra-modal discrimination learning, inter-modal consistency learning, and cross-modal correlation learning are performed to eliminate the heterogeneity among cross-domain features, determine the semantic relevance among cross-domain data features, and generate robust and uniform feature representations for retrieval tasks. Experimental results show that the mean Average Precision (mAP) values of this method on the TU-Berlin, IM2MN, and MI3DOR datasets are 0.448, 0.689, and 0.874, respectively, significantly better than comparative methods.

Key words: cross-domain retrieval, feature alignment, domain alignment, sketch, real image, three-dimensional model, correlation learning

中图分类号:

TP391.4

焦世超, 关日鹏, 况立群, 熊风光, 韩燮. 深度语义关联学习的基于图像视觉数据跨域检索[J]. 计算机工程, 2024, 50(5): 190-199.

JIAO Shichao, GUAN Ripeng, KUANG Liqun, XIONG Fengguan, HAN Xie. Image-Based Cross-Domain Visual-Data Retrieval with Deep Semantic Correlation Learning[J]. Computer Engineering, 2024, 50(5): 190-199.

https://www.ecice06.com/CN/Y2024/V50/I5/190

参考文献

[1] FENG Y F, GAO Y, ZHAO X B, et al. SHREC'22 track:open-set 3D object retrieval[J]. Computers and Graphics, 2022, 107(C):231-240.
[2] 关日鹏, 况立群, 焦世超, 等. 多模态特征融合与词嵌入驱动的三维检索方法[J]. 计算机工程, 2023, 49(4):101-107, 113. GUAN R P, KUANG L Q, JIAO S C, et al. Retrieval method of 3D models driven by multi-modal feature fusion and word embedding[J]. Computer Engineering, 2023, 49(4):101-107, 113.(in Chinese)
[3] CHEN Y B, XIAN Y Q, KOEPKE A S, et al. Distilling audio-visual knowledge by compositional contrastive learning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2021:7016-7025.
[4] YU Q, SONG J F, SONG Y Z, et al. Fine-grained instance-level sketch-based image retrieval[J]. International Journal of Computer Vision, 2021,129(2):484-500.
[5] 白静, 拖继文, 白少进, 等. 基于自适应多类中心和半异构网络的三维模型草图检索[J]. 图学学报, 2022, 43(1):36-43. BAI J, TUO J W, BAI S J, et al. Adaptive multi-class centers and semi-heterogeneous network for sketch-based 3D model retrieval[J]. Journal of Graphics, 2022, 43(1):36-43.(in Chinese)
[6] YANG H R, TIAN Y, YANG C F, et al. Sequential learning for sketch-based 3D model retrieval[J]. Multimedia Systems, 2022, 28(3):761-778.
[7] HU N, ZHOU H Y, LIU A A, et al. Collaborative distribution alignment for 2D image-based 3D shape retrieval[J]. Journal of Visual Communication and Image Representation, 2022, 83:103426.
[8] 田加林, 徐行, 沈复民, 等. 基于跨模态自蒸馏的零样本草图检索[J]. 软件学报, 2022, 33(9):3152-3164. TIAN J L, XU X, SHEN F M, et al. Cross-modal self-distillation for zero-shot sketch-based image retrieval[J]. Journal of Software, 2022, 33(9):3152-3164.(in Chinese)
[9] 姬子恒, 王斌. 基于深度学习的草图检索方法研究进展[J]. 计算机工程与科学, 2021, 43(12):2190-2205. JI Z H, WANG B. Research progress on deep learning based sketch retrieval[J]. Computer Engineering & Science, 2021, 43(12):2190-2205.(in Chinese)
[10] SHEN Y M, LIU L, SHEN F M, et al. Zero-shot sketch-image hashing[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2018:3598-3607.
[11] LI J T, LING Z X, NIU L, et al. Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement[J]. Computer Vision and Image Understanding, 2022, 218:103412.
[12] LEI J J, SONG Y X, PENG B, et al. Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(9):3226-3237.
[13] NIE W Z, ZHAO Y, NIE J, et al. CLN:cross-domain learning network for 2D image-based 3D shape retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3):992-1005.
[14] 白静, 周文惠, 拖继文, 等. 时空信息联合嵌入的端到端三维模型草图检索[J]. 计算机辅助设计与图形学学报, 2021, 33(6):826-836. BAI J, ZHOU W H, TUO J W, et al. End-to-end sketch-3D model retrieval with spatiotemporal information joint embedding[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(6):826-836.(in Chinese)
[15] DAI W D, LIANG S. Cross-modal guidance network for sketch-based 3D shape retrieval[C]//Proceedings of IEEE International Conference on Multimedia and Expo. Washington D.C.,USA:IEEE Press,2020:1-6.
[16] BAI C, CHEN J, MA Q, et al. Cross-domain representation learning by domain-migration generative adversarial network for sketch based image retrieval[J]. Journal of Visual Communication and Image Representation, 2020, 71:102835.
[17] NAGPAL S, SINGH M, SINGH R, et al. Discriminative shared transform learning for sketch to image matching[J]. Pattern Recognition, 2021, 114:107815.
[18] WANG X Y, TANG J, TAN S B. Three-way enhanced part-aware network for fine-grained sketch-based image retrieval[J]. Applied Intelligence, 2022, 52(10):10901-10916.
[19] CHEN Y D, ZHANG Z L, WANG Y F, et al. AE-Net:fine-grained sketch-based image retrieval via attention-enhanced network[J]. Pattern Recognition, 2022, 122:108291.
[20] SUN H F, XU J Q, WANG J Y, et al. DLI-net:dual local interaction network for fine-grained sketch-based image retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10):7177-7189.
[21] JING T T, XIA H F, HAMM J, et al. Augmented multimodality fusion for generalized zero-shot sketch-based visual retrieval[J]. IEEE Transactions on Image Processing, 2022, 31:3657-3668.
[22] TURSUN O, DENMAN S, SRIDHARAN S, et al. An efficient framework for zero-shot sketch-based image retrieval[J]. Pattern Recognition, 2022, 126:108528.
[23] TIAN J L, XU X, SHEN F M, et al. TVT:three-way vision transformer through multi-modal hypersphere learning for zero-shot sketch-based image retrieval[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2):2370-2378.
[24] LIN M X, YANG J, WANG H, et al. Single image 3D shape retrieval via cross-modal instance and category contrastive learning[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C.,USA:IEEE Press,2021:11385-11395.
[25] QI A R, GRYADITSKAYA Y, SONG J F, et al. Toward fine-grained sketch-based 3D shape retrieval[J]. IEEE Transactions on Image Processing, 2021, 30:8595-8606.
[26] 赵旭飞, 潘翔, 刘复昌, 等. 基于哈希自注意力端到端网络的三维模型草图检索[J]. 计算机辅助设计与图形学学报, 2021, 33(5):798-805. ZHAO X F, PAN X, LIU F C, et al. Hash self-attention end-to-end network for sketch-based 3D shape retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(5):798-805.(in Chinese)
[27] GANIN Y, LEMPITSKY V. Unsupervised domain adaptation by backpropagation[EB/OL].[2023-03-05]. https://arxiv.org/abs/1409.7495.
[28] JING L L, VAHDANI E, TAN J X, et al. Cross-modal center loss for 3D cross-modal retrieval[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2021:3142-3151.
[29] XU F, YANG W, JIANG T B, et al. Mental retrieval of remote sensing images via adversarial sketch-image feature learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(11):7801-7814.
[30] SANGKLOY P, BURNELL N, HAM C, et al. The Sketchy database:learning to retrieve badly drawn bunnies[J]. ACM Transactions on Graphics, 2016, 35(4):119.
[31] EITZ M, HAYS J, ALEXA M. How do humans sketch objects?[J]. ACM Transactions on Graphics, 2012, 31(4):44.
[32] ZHANG H, LIU S, ZHANG C Q, et al. SketchNet:sketch classification with Web images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2016:1105-1113.
[33] SONG J F, YU Q, SONG Y Z, et al. Deep spatial-semantic attention for fine-grained sketch-based image retrieval[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D.C.,USA:IEEE Press,2017:5552-5561.
[34] RADENOVIC F, TOLIAS G, CHUM O. Deep shape matching[C]//Proceedings of European Conference on Computer Vision. Berlin,Germany:Springer, 2018:774-791.
[35] JIANG T B, XIA G S, LU Q K, et al. Retrieving aerial scene images with learned deep image-sketch features[J]. Journal of Computer Science and Technology, 2017, 32(4):726-737.
[36] DEY S, RIBA P, DUTTA A, et al. Doodle to search:practical zero-shot sketch-based image retrieval[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2019:2179-2188.
[37] ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2019:10394-10403.
[38] JIAO S C, HAN X, XIONG F G, et al. Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval[J]. Neural Computing and Applications, 2022, 34(16):13469-13483.
[39] YELAMARTHI S K, REDDY S K, MISHRA A, et al. A zero-shot framework for sketch based image retrieval[C]//Proceedings of European Conference on Computer Vision. Berlin,Germany:Springer,2018:316-333.
[40] DUTTA T, SINGH A, BISWAS S. StyleGuide:zero-shot sketch-based image retrieval using style-guided image generation[J]. IEEE Transactions on Multimedia, 2021, 23:2833-2842.
[41] CHAUDHURI U, BANERJEE B, BHATTACHARYA A, et al. CrossATNet-a novel cross-attention based framework for sketch-based image retrieval[J]. Image and Vision Computing, 2020, 104:104003.
[42] ZHANG Z L, ZHANG Y J, FENG R, et al. Zero-shot sketch-based image retrieval via graph convolution network[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7):12943-12950.
[43] CHAUDHURI U, CHAVAN R, BANERJEE B, et al. BDA-SketRet:bi-level domain adaptation for zero-shot SBIR[J]. Neurocomputing, 2022, 514:245-255.
[44] WANG H, DENG C, LIU T L, et al. Transferable coupled network for zero-shot sketch-based image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12):9181-9194.
[45] LEE T, LIN Y L, CHIANG H, et al. Cross-domain image-based 3D shape retrieval by view sequence learning[C]//Proceedings of International Conference on 3D Vision. Washington D.C.,USA:IEEE Press,2018:258-266.
[46] LI W H, SONG D, LIU A N, et al. SHREC 2020 track:extended monocular image based 3D model retrieval[EB/OL].[2023-03-05].https://www.semanticscholar.org/paper/SHREC-2020-Track%3A-Extended-Monocular-Image-Based-3D-Li-Song/ac7b6d06d49bd36341b8220192fa7ce59b0fcdf5.
[47] SU Y T, LI Y Q, SONG D, et al. Consistent domain structure learning and domain alignment for 2D image-based 3D objects retrieval[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence.Washington D.C.,USA:IEEE Press,2020:883-889.
[48] WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets:a deep representation for volumetric shape[EB/OL].[2023-03-05].https://arxiv.org/abs/1406.5670.
[49] 杜雨佳, 李海生, 姚春莲, 等. 基于三元组网络的单图三维模型检索[J]. 北京航空航天大学学报, 2020, 46(9):1691-1700. DU Y J, LI H S, YAO C L, et al. Monocular image based 3D model retrieval using triplet network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9):1691-1700.(in Chinese)

选择文件类型/文献管理软件名称

选择包含的内容