Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Skeleton Prior Enhanced Structure-Aware Point Cloud Model Retrieval

  

  • Published:2026-06-29

骨架先验增强结构感知的点云检索

Abstract: During the feature extraction process of point cloud retrieval, mainstream methods typically form global shape descriptors through hierarchical aggregation of local neighborhood features. Their perception of object structural information mainly relies on indirect inference from the spatial distribution of surface points, lacking direct exploitation of explicit structural priors such as point cloud skeletons. To address the problem of insufficient structural information utilization in existing point cloud retrieval methods when extracting global shape features, this paper proposes a skeleton‑prior enhanced structure‑aware point cloud retrieval network. The method first introduces point cloud skeletons as structural priors, enhances the structural expressiveness of point cloud features through a dual mechanism of explicit fusion and implicit guidance, and designs an adaptive feature aggregation module to aggregate multi‑scale features, thereby forming the final global descriptor. Specifically, the proposed method consists of two modules. First, a dual‑branch feature fusion module. This module extracts skeleton point clouds from the input point cloud, and then employs two independent PointNet++ branches to extract multi‑scale local features from the original point cloud and the skeleton, respectively. At each scale, using skeleton features as keys and values and point cloud features as queries, a multi‑scale cross‑attention mechanism is adopted to weightedly integrate the skeleton structural information into the point cloud features. Meanwhile, a contrastive learning task is constructed: the combination of a cropped point cloud and the complete skeleton serves as the anchor, the combination of the complete point cloud and the skeleton serves as the positive sample, and samples from other categories within the batch serve as negative samples. The contrastive loss implicitly guides the model to learn structural consistency, forming an“explicit fusion + implicit guidance” dual structural enhancement mechanism. Second, a multi‑scale local adaptive aggregation descriptor (MVLAAD) module. This module consists of the Vector Local Adaptive Aggregation Descriptor (VLAAD) and multi‑scale aggregation enhancement. Based on a lightweight Transformer decoder, VLAAD takes the local feature sequence of the input point cloud as keys and values, and the initial generic cluster centers as queries. Through iterative updates via multi‑layer cross‑attention, it dynamically generates personalized cluster centers that adapt to each input sample. A momentum update strategy further combines the generic centers with the personalized centers, balancing adaptability and stability. Subsequently, using the updated cluster centers, soft assignment weights are computed for the refined multi‑scale features and the residuals are aggregated to produce global descriptors at three scales. Finally, a gating mechanism enhances the features and outputs the compact global descriptor. In addition, during training a dynamic weight adjustment strategy is adopted, combining classification loss, triplet loss, and contrastive loss. The training emphasizes contrastive learning in the early stage and shifts focus to triplet loss in the later stage, thereby reinforcing both structure perception and discriminative learning. Experimental results show that the proposed method achieves 82.6% mAP on the ModelNet40 dataset, outperforming the state‑of‑the‑art method CF3D by 1.3%, and attains 84.6% mAP on ShapeNet, exceeding existing methods. Ablation studies verify the effectiveness of each module: the baseline PointNet++ achieves 62.0% mAP, while the full model increases the performance to 82.6%. A lightweight version reduces the number of parameters to 20.34M and further improves mAP to 84.7%. Robustness experiments demonstrate that the method performs robustly under moderate sparsity, low‑level noise, and mild occlusion, but its performance degrades significantly under extreme degradation conditions. In summary, the proposed skeleton‑prior enhanced structure‑aware point cloud retrieval network addresses the deficiency of structural information utilization in existing methods through explicit skeleton prior fusion and implicit contrastive learning guidance. The designed MVLAAD module dynamically generates personalized cluster centers, thereby improving the discriminability of the global descriptor.

摘要: 针对点云检索过程中的特征提取过程,主流方法通过层次化局部邻域特征聚合形成全局形状描述符,其对物体结构信息的感知主要依赖于对表面点空间分布的间接推断,缺乏对点云骨架等显式结构先验的直接利用。针对现有点云检索方法在提取全局形状特征时结构信息利用不足的问题,该文提出一种用骨架先验增强结构感知的点云检索网络。该文首先引入点云骨架作为结构先验,通过显式融合与隐式引导双重机制增强点云特征的结构表达能力,并设计自适应特征聚合模块聚合多个尺度特征,从而形成最终的全局描述符。具体而言,该文方法包含两模块。第一,双分支特征融合模块:该模块从输入点云中提取骨架点云,然后通过两个独立的PointNet++分支分别提取原始点云与骨架的多尺度局部特征。在每个尺度上,以骨架特征作为键与值、点云特征作为查询,采用多尺度交叉注意力机制将骨架结构信息加权融合到点云特征中。同时,构造对比学习任务,将裁剪点云与完整骨架组合作为锚点,完整点云与骨架组合作为正样本,批内其他类别样本作为负样本,通过损失隐式引导模型学习结构一致性,形成“显式融合+隐式引导”的双重结构增强机制。第二,设计多尺度局部自适应聚合模块(MVLAAD):该模块由VLAAD与多尺度聚合增强两部分组成。VLAAD基于轻量级Transformer解码器,以输入点云的局部特征序列为键与值,以初始通用聚类中心为查询,通过多层交叉注意力迭代更新,动态生成适应每个输入样本的个性化聚类中心;再结合动量更新策略融合通用中心与个性化中心,平衡自适应性与稳定性。接着,基于更新后的聚类中心,分别对精炼后的多尺度特征计算软分配权重并聚合残差,生成三个尺度的全局描述符,最后经门控机制增强后输出最终紧凑描述符。此外,具体训练过程中采用动态权重调整策略,综合分类损失、三元组损失和对比损失进行训练。训练初期侧重对比学习,后期转向三元组损失,从而同时强化结构感知与判别性学习。实验结果表明,在ModelNet40数据集上该方法取得82.6%的mAP,较先进方法CF3D提升1.3%。在ShapeNet上达84.6%,优于现有方法。消融实验验证了各模块的有效性:基线Pointnet++的mAP为62.0%,引入该文的设计后模型性能提升至82.6%。轻量化版本参数量降至20.34M,mAP提升至84.7%。鲁棒性实验表明,方法在中等稀疏、低噪声及轻度遮挡条件下表现稳健,但在极端退化条件下性能明显下降。综上所述,该文所提出的骨架先验增强结构感知网络,通过显式融合骨架先验与隐式对比学习引导,解决了现有方法对结构信息利用不足的缺陷;设计的MVLAAD模块通过动态生成个性化聚类中心,实现了对全局描述符判别性的提升。