作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (2): 322-334. doi: 10.19678/j.issn.1000-3428.0069685

• 图形图像处理 • 上一篇    下一篇

基于紧凑中心的多模态三维模型检索研究

龙丽叶1,2,3, 焦世超1,2,3,*(), 郭磊1,2,3, 韩燮1,2,3, 况立群1,2,3   

  1. 1. 中北大学计算机科学与技术学院, 山西 太原 030051
    2. 机器视觉与虚拟现实山西省重点实验室, 山西 太原 030051
    3. 山西省视觉信息处理及智能机器人工程研究中心, 山西 太原 030051
  • 收稿日期:2024-04-02 出版日期:2025-02-15 发布日期:2024-09-20
  • 通讯作者: 焦世超
  • 基金资助:
    国家自然科学基金(62272426); 山西省科技重大专项计划“揭榜挂帅”项目(202201150401021); 山西省科技成果转化引导专项(202104021301055); 山西省自然科学基金(202303021212189); 山西省自然科学基金(202303021211153); 山西省自然科学基金(202203021222027)

Multimodal 3D Model Retrieval Based on Compact Center Loss

LONG Liye1,2,3, JIAO Shichao1,2,3,*(), GUO Lei1,2,3, HAN Xie1,2,3, KUANG Liqun1,2,3   

  1. 1. School of Computer Science and Technology, North University of China, Taiyuan 030051, Shanxi, China
    2. Shanxi Provincial Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, Shanxi, China
    3. Shanxi Province's Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, Shanxi, China
  • Received:2024-04-02 Online:2025-02-15 Published:2024-09-20
  • Contact: JIAO Shichao

摘要:

随着三维模型分类检索任务的不断发展, 多模态特征融合已经成为提高模型性能和丰富形状表征的关键技术之一。现有基于多模态的三维模型检索方法侧重于直接融合多种全局特征, 然后利用交叉熵损失拟合标签信息, 将检索任务转化为分类任务, 同时忽略了复杂三维模型多模态之间的局部互补信息, 导致检索性能不够理想。为了解决上述问题, 提出一种基于紧凑中心损失的全局-局部特征互补融合方法。首先, 利用预训练模型从点云数据和多视图数据中提取深度特征; 然后, 设计注意力感知融合模块, 利用点云与多视图特征间的关系分数细化视图特征集并融合点云特征, 以获得显著的局部互补信息; 其次, 引入多头注意力机制, 在特征动态聚合模块中自适应地探索全局点云特征、全局视图特征以及局部互补特征之间的潜在模态表示, 进一步融合互补特征并最小化冗余; 最后, 利用紧凑中心损失和交叉熵损失的联合约束, 在最小化类内距离的同时最大化类间距离, 生成具有高度区分性的特征描述符。在ModelNet40、ModelNet10数据集上的实验结果表明, 所提方法取得了93.4%、94.8%的分类准确率(OA)以及92.5%、95.1%的平均精度均值(mAP)。

关键词: 多模态融合, 三维模型, 注意力, 特征互补, 损失函数

Abstract:

With the continuous advancement of 3D model classification and retrieval tasks, multimodal feature fusion has emerged as a key technique for enhancing model performance and enriching shape representation. Existing multimodal-based 3D model retrieval methods primarily focus on directly fusing multiple global features and aligning them with label information using cross-entropy loss, effectively transforming the retrieval task into a classification problem. However, these approaches often neglect the local complementary information within multimodal features of complex 3D models, leading to suboptimal retrieval performance. To address this limitation, this study proposes a complementary global-local feature fusion method based on compact center loss. Deep features are first extracted from point cloud and multi-view data using pre-trained models. An attention-aware fusion module is then introduced, leveraging relation scores between point cloud and multi-view features to refine view features and integrate point cloud features, thereby capturing critical local complementary information. A multi-head attention mechanism within the feature dynamic aggregation module further explores potential modal representations among global point cloud features, global view features, and local complementary features, enhancing feature fusion while reducing redundancy. Finally, by jointly constraining compact center and cross-entropy losses, the method minimizes intra-class distances and maximizes inter-class separations, resulting in highly discriminative feature descriptors. Experimental results on the ModelNet40 and ModelNet10 datasets demonstrate that the proposed method achieves state-of-the-art performance, with an Overall Accuracy (OA) of 93.4% and 94.8% and a mean Average Precision (mAP) of 92.5% and 95.1%, respectively.

Key words: multimodal fusion, 3D model, attention, complementary features, loss function