Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2023, Vol. 49 ›› Issue (4): 101-107,113. doi: 10.19678/j.issn.1000-3428.0064951

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Retrieval Method of 3D Models Driven by Multi-modal Feature Fusion and Word Embedding

GUAN Ripeng, KUANG Liqun, JIAO Shichao, XIONG Fengguang, HAN Xie   

  1. School of Data Science and Technology, North University of China, Taiyuan 030051, China
  • Received:2022-06-10 Revised:2022-08-01 Published:2022-08-04

多模态特征融合与词嵌入驱动的三维检索方法

关日鹏, 况立群, 焦世超, 熊风光, 韩燮   

  1. 中北大学 大数据学院, 太原 030051
  • 作者简介:关日鹏(1996-),男,硕士研究生,主研方向为人工智能、计算机视觉;况立群(通信作者),教授、博士;焦世超,博士研究生;熊风光,副教授、博士;韩燮,教授、博士。
  • 基金资助:
    国家自然科学基金(62106238);山西省回国留学人员科研项目(2020-113);山西省科技成果转化引导专项(202104021301055)。

Abstract: In 3D model classification and retrieval based on point clouds and images, the existing feature fusion methods do not consider the feature information in the mode and the complementary information between modes.Additionally, fusion feature loss occurs, no high-dimensional correlation is indicated between classification labels and prediction features, and the retrieval accuracy is low.Hence, a network structure driven by multi-modal features and word embedding is proposed to classify and retrieve 3D models.A feature extractor is used to extract the features of a 3D model from point clouds and views, and the features of different modes are aligned through a shared space.In terms of modal fusion, the cosine similarity between different modes is calculated to enhance the modal features that are then spliced to obtain the fusion features.In terms of model feature classification, a unified representation and classification retrieval of 3D model features is realized by establishing a high-dimensional correlation between the word embedding model and classification label.Experiments on ModelNet10 and ModelNet40 datasets show that the mean Average Precision(mAP) of the network is 92.9% and 91.5%, respectively, and that accurate 3D model feature descriptors can be obtained.Compared with VoxNet, SCIF, MVCNN and other retrieval methods, the proposed method can significantly improve the retrieval and classification accuracies of 3D models.

Key words: 3D model, feature fusion, word embedding, deep learning, feature extraction

摘要: 在基于点云和图像的三维模型分类检索中,现有特征融合方法忽略了模态内的特征信息和模态间的互补信息,存在融合特征丢失的问题,且分类标签和预测特征之间缺乏高维相关性,检索准确率较低。针对该问题,提出一种多模态特征和词嵌入联合驱动的网络结构,以对三维模型进行分类检索。在特征提取过程中,利用特征提取器提取来自点云和视图的三维模型特征,通过共享空间来对齐不同模态的特征。在模态融合过程中,计算不同模态之间的余弦相似度以增强模态特征,将增强特征进行拼接得到融合特征。在模型特征分类的过程中,通过建立词嵌入模型与分类标签的高维相关性实现三维模型特征的统一表示和分类检索。在ModelNet10和ModelNet40数据集上进行实验,结果表明,该网络的平均检索精度均值分别达到92.9%和91.5%,可以获取精准的三维模型特征描述符,与VoxNet、SCIF、MVCNN等检索方法相比,其能显著提高三维模型的检索精度和分类准确率。

关键词: 三维模型, 特征融合, 词嵌入, 深度学习, 特征提取

CLC Number: