Retrieval Method of 3D Models Driven by Multi-modal Feature Fusion and Word Embedding

doi:10.19678/j.issn.1000-3428.0064951

Abstract

Abstract: In 3D model classification and retrieval based on point clouds and images, the existing feature fusion methods do not consider the feature information in the mode and the complementary information between modes.Additionally, fusion feature loss occurs, no high-dimensional correlation is indicated between classification labels and prediction features, and the retrieval accuracy is low.Hence, a network structure driven by multi-modal features and word embedding is proposed to classify and retrieve 3D models.A feature extractor is used to extract the features of a 3D model from point clouds and views, and the features of different modes are aligned through a shared space.In terms of modal fusion, the cosine similarity between different modes is calculated to enhance the modal features that are then spliced to obtain the fusion features.In terms of model feature classification, a unified representation and classification retrieval of 3D model features is realized by establishing a high-dimensional correlation between the word embedding model and classification label.Experiments on ModelNet10 and ModelNet40 datasets show that the mean Average Precision(mAP) of the network is 92.9% and 91.5%, respectively, and that accurate 3D model feature descriptors can be obtained.Compared with VoxNet, SCIF, MVCNN and other retrieval methods, the proposed method can significantly improve the retrieval and classification accuracies of 3D models.

Key words: 3D model, feature fusion, word embedding, deep learning, feature extraction

摘要： 在基于点云和图像的三维模型分类检索中，现有特征融合方法忽略了模态内的特征信息和模态间的互补信息，存在融合特征丢失的问题，且分类标签和预测特征之间缺乏高维相关性，检索准确率较低。针对该问题，提出一种多模态特征和词嵌入联合驱动的网络结构，以对三维模型进行分类检索。在特征提取过程中，利用特征提取器提取来自点云和视图的三维模型特征，通过共享空间来对齐不同模态的特征。在模态融合过程中，计算不同模态之间的余弦相似度以增强模态特征，将增强特征进行拼接得到融合特征。在模型特征分类的过程中，通过建立词嵌入模型与分类标签的高维相关性实现三维模型特征的统一表示和分类检索。在ModelNet10和ModelNet40数据集上进行实验，结果表明，该网络的平均检索精度均值分别达到92.9%和91.5%，可以获取精准的三维模型特征描述符，与VoxNet、SCIF、MVCNN等检索方法相比，其能显著提高三维模型的检索精度和分类准确率。

关键词: 三维模型, 特征融合, 词嵌入, 深度学习, 特征提取

CLC Number:

TP391.4

GUAN Ripeng, KUANG Liqun, JIAO Shichao, XIONG Fengguang, HAN Xie. Retrieval Method of 3D Models Driven by Multi-modal Feature Fusion and Word Embedding[J]. Computer Engineering, 2023, 49(4): 101-107,113.

关日鹏, 况立群, 焦世超, 熊风光, 韩燮. 多模态特征融合与词嵌入驱动的三维检索方法[J]. 计算机工程, 2023, 49(4): 101-107,113.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0064951

http://www.ecice06.com/EN/Y2023/V49/I4/101

Figures/Tables 9

References

[1] 顾砾, 季怡, 刘纯平.基于多模态特征融合的三维点云分类方法[J].计算机工程, 2021, 47(2):279-284. GU L, JI Y, LIU C P.Classification method of three-dimensional point cloud based on multiple modal feature fusion[J].Computer Engineering, 2021, 47(2):279-284.(in Chinese)
[2] 王亚, 郑博文, 张欣.基于多模态融合的三维模型检索算法研究[J].计算机应用研究, 2021, 38(3):685-688, 695. WANG Y, ZHENG B W, ZHANG X.3D model retrieval algorithm based on multimodal fusion[J].Application Research of Computers, 2021, 38(3):685-688, 695.(in Chinese)
[3] 刘安安, 李天宝, 王晓雯, 等.基于深度学习的三维模型检索算法综述[J].数据采集与处理, 2021, 36(1):1-21. LIU A A, LI T B, WANG X W, et al.Review of 3D model retrieval algorithms based on deep learning[J].Journal of Data Acquisition and Processing, 2021, 36(1):1-21.(in Chinese)
[4] 张满囤, 燕明晓, 马英石, 等.基于八叉树结构的三维体素模型检索[J].计算机学报, 2021, 44(2):334-346. ZHANG M D, YAN M X, MA Y S, et al.3D voxel model retrieval based on octree structure[J].Chinese Journal of Computers, 2021, 44(2):334-346.(in Chinese)
[5] NIE W Z, ZHAO Y, NIE J, et al.CLN:cross-domain learning network for 2D image-based 3D shape retrieval[J].IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3):992-1005.
[6] SU Y T, LI Y Q, SONG D, et al.Joint intermediate domain generation and distribution alignment for 2D image-based 3D objects retrieval[J].IEEE Transactions on Multimedia, 2021, 23:2127-2138.
[7] LIANG S, DAI W D, WEI Y C.Uncertainty learning for noise resistant sketch-based 3D shape retrieval[J].IEEE Transactions on Image Processing:A Publication of the IEEE Signal Processing Society, 2021, 30:8632-8643.
[8] LI W H, ZHAO Z L, LIU A N, et al.Joint local correlation and global contextual information for unsupervised 3D model retrieval and classification[J].IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(5):3265-3278.
[9] SONG D, LI T B, LI W H, et al.Universal cross-domain 3D model retrieval[J].IEEE Transactions on Multimedia, 2021, 23:2721-2731.
[10] GAO Z, ZHANG Y, ZHANG H, et al.Multi-level view associative convolution network for view-based 3D model retrieval[J].IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(4):2264-2278.
[11] WU Z R, SONG S R, KHOSLA A, et al.3D ShapeNets:a deep representation for volumetric shapes[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:1912-1920.
[12] MATURANA D, SCHERER S.VoxNet:a 3D convolutional neural network for real-time object recognition[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2015:922-928.
[13] QI C R, YI L, SU H, et al.PointNet++:deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Washington D.C., USA:IEEE Press, 2017:5105-5114.
[14] WANG Y, SUN Y B, LIU Z W, et al.Dynamic graph CNN for learning on point clouds[J].ACM Transactions on Graphics, 2019, 38(5):1-12.
[15] SU H, MAJI S, KALOGERAKIS E, et al.Multi-view convolutional neural networks for 3D shape recognition[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2016:945-953.
[16] FENG Y F, ZHANG Z Z, ZHAO X B, et al.GVCNN:group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:264-272.
[17] YOU H X, FENG Y F, JI R R, et al.PVNet:a joint convolutional network of point cloud and multi-view for 3D shape recognition[C]//Proceedings of the 26th ACM International Conference on Multimedia.New York, USA:ACM Press, 2018:1310-1318.
[18] NIE W Z, LIANG Q, WANG Y X, et al.MMFN:multimodal information fusion networks for 3D model classification and retrieval[J].ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(4):1-22.
[19] BAI J J, GONG B, ZHAO Y N, et al.Multi-scale representation learning on hypergraph for 3D shape retrieval and recognition[J].IEEE Transactions on Image Processing:A Publication of the IEEE Signal Processing Society, 2021, 30:5327-5338.
[20] DEY S, RIBA P, DUTTA A, et al.Doodle to search:practical zero-shot sketch-based image retrieval[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:2174-2183.
[21] DUTTA A, AKATA Z.Semantically tied paired cycle consistency for any-shot sketch-based image retrieval[J].International Journal of Computer Vision, 2020, 128(10):2684-2703.
[22] DENG C, XU X X, WANG H, et al.Progressive cross-modal semantic network for zero-shot sketch-based image retrieval[J].IEEE Transactions on Image Processing:A Publication of the IEEE Signal Processing Society, 2020, 29:8892-8902.
[23] ZHANG Z L, ZHANG Y J, FENG R, et al.Zero-shot sketch-based image retrieval via graph convolution network[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7):12943-12950.
[24] JING L L, VAHDANI E, TAN J X, et al.Cross-modal center loss for 3D cross-modal retrieval[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:3141-3150.
[25] GAO J, HE Y H, ZHANG X Y, et al.Duplicate short text detection based on Word2Vec[C]//Proceedings of the 8th IEEE International Conference on Software Engineering and Service Science.Washington D.C., USA:IEEE Press, 2018:33-37.
[26] LIU A N, GUO F B, ZHOU H Y, et al.Semantic and context information fusion network for view-based 3D model classification and retrieval[J].IEEE Access, 2020, 8:155939-155950.
[27] LI Y, BU R, SUN M, et al.PointCNN:convolution on x-transformed points[EB/OL].[2022-05-05].https://proceedings.neurips.cc/paper/2018/file/f5f8590cd58a54e94377e6ae2eded4d9-Paper.pdf.
[28] LIANG Q, XIAO M M, SONG D.3D shape recognition based on multi-modal information fusion[J].Multimedia Tools and Applications, 2021, 80(11):16173-16184.

Please choose a citation manager

Content to export