Skeleton Prior Enhanced Structure-Aware Point Cloud Model Retrieval

doi:10.19678/j.issn.1000-3428.0260245

Abstract

Abstract: During the feature extraction process of point cloud retrieval, mainstream methods typically form global shape descriptors through hierarchical aggregation of local neighborhood features. Their perception of object structural information mainly relies on indirect inference from the spatial distribution of surface points, lacking direct exploitation of explicit structural priors such as point cloud skeletons. To address the problem of insufficient structural information utilization in existing point cloud retrieval methods when extracting global shape features, this paper proposes a skeleton‑prior enhanced structure‑aware point cloud retrieval network. The method first introduces point cloud skeletons as structural priors, enhances the structural expressiveness of point cloud features through a dual mechanism of explicit fusion and implicit guidance, and designs an adaptive feature aggregation module to aggregate multi‑scale features, thereby forming the final global descriptor. Specifically, the proposed method consists of two modules. First, a dual‑branch feature fusion module. This module extracts skeleton point clouds from the input point cloud, and then employs two independent PointNet++ branches to extract multi‑scale local features from the original point cloud and the skeleton, respectively. At each scale, using skeleton features as keys and values and point cloud features as queries, a multi‑scale cross‑attention mechanism is adopted to weightedly integrate the skeleton structural information into the point cloud features. Meanwhile, a contrastive learning task is constructed: the combination of a cropped point cloud and the complete skeleton serves as the anchor, the combination of the complete point cloud and the skeleton serves as the positive sample, and samples from other categories within the batch serve as negative samples. The contrastive loss implicitly guides the model to learn structural consistency, forming an“explicit fusion + implicit guidance” dual structural enhancement mechanism. Second, a multi‑scale local adaptive aggregation descriptor (MVLAAD) module. This module consists of the Vector Local Adaptive Aggregation Descriptor (VLAAD) and multi‑scale aggregation enhancement. Based on a lightweight Transformer decoder, VLAAD takes the local feature sequence of the input point cloud as keys and values, and the initial generic cluster centers as queries. Through iterative updates via multi‑layer cross‑attention, it dynamically generates personalized cluster centers that adapt to each input sample. A momentum update strategy further combines the generic centers with the personalized centers, balancing adaptability and stability. Subsequently, using the updated cluster centers, soft assignment weights are computed for the refined multi‑scale features and the residuals are aggregated to produce global descriptors at three scales. Finally, a gating mechanism enhances the features and outputs the compact global descriptor. In addition, during training a dynamic weight adjustment strategy is adopted, combining classification loss, triplet loss, and contrastive loss. The training emphasizes contrastive learning in the early stage and shifts focus to triplet loss in the later stage, thereby reinforcing both structure perception and discriminative learning. Experimental results show that the proposed method achieves 82.6% mAP on the ModelNet40 dataset, outperforming the state‑of‑the‑art method CF3D by 1.3%, and attains 84.6% mAP on ShapeNet, exceeding existing methods. Ablation studies verify the effectiveness of each module: the baseline PointNet++ achieves 62.0% mAP, while the full model increases the performance to 82.6%. A lightweight version reduces the number of parameters to 20.34M and further improves mAP to 84.7%. Robustness experiments demonstrate that the method performs robustly under moderate sparsity, low‑level noise, and mild occlusion, but its performance degrades significantly under extreme degradation conditions. In summary, the proposed skeleton‑prior enhanced structure‑aware point cloud retrieval network addresses the deficiency of structural information utilization in existing methods through explicit skeleton prior fusion and implicit contrastive learning guidance. The designed MVLAAD module dynamically generates personalized cluster centers, thereby improving the discriminability of the global descriptor.

摘要： 针对点云检索过程中的特征提取过程，主流方法通过层次化局部邻域特征聚合形成全局形状描述符，其对物体结构信息的感知主要依赖于对表面点空间分布的间接推断，缺乏对点云骨架等显式结构先验的直接利用。针对现有点云检索方法在提取全局形状特征时结构信息利用不足的问题，该文提出一种用骨架先验增强结构感知的点云检索网络。该文首先引入点云骨架作为结构先验，通过显式融合与隐式引导双重机制增强点云特征的结构表达能力，并设计自适应特征聚合模块聚合多个尺度特征，从而形成最终的全局描述符。具体而言，该文方法包含两模块。第一，双分支特征融合模块：该模块从输入点云中提取骨架点云，然后通过两个独立的PointNet++分支分别提取原始点云与骨架的多尺度局部特征。在每个尺度上，以骨架特征作为键与值、点云特征作为查询，采用多尺度交叉注意力机制将骨架结构信息加权融合到点云特征中。同时，构造对比学习任务，将裁剪点云与完整骨架组合作为锚点，完整点云与骨架组合作为正样本，批内其他类别样本作为负样本，通过损失隐式引导模型学习结构一致性，形成“显式融合+隐式引导”的双重结构增强机制。第二，设计多尺度局部自适应聚合模块（MVLAAD）：该模块由VLAAD与多尺度聚合增强两部分组成。VLAAD基于轻量级Transformer解码器，以输入点云的局部特征序列为键与值，以初始通用聚类中心为查询，通过多层交叉注意力迭代更新，动态生成适应每个输入样本的个性化聚类中心；再结合动量更新策略融合通用中心与个性化中心，平衡自适应性与稳定性。接着，基于更新后的聚类中心，分别对精炼后的多尺度特征计算软分配权重并聚合残差，生成三个尺度的全局描述符，最后经门控机制增强后输出最终紧凑描述符。此外，具体训练过程中采用动态权重调整策略，综合分类损失、三元组损失和对比损失进行训练。训练初期侧重对比学习，后期转向三元组损失，从而同时强化结构感知与判别性学习。实验结果表明，在ModelNet40数据集上该方法取得82.6%的mAP，较先进方法CF3D提升1.3%。在ShapeNet上达84.6%，优于现有方法。消融实验验证了各模块的有效性：基线Pointnet++的mAP为62.0%，引入该文的设计后模型性能提升至82.6%。轻量化版本参数量降至20.34M，mAP提升至84.7%。鲁棒性实验表明，方法在中等稀疏、低噪声及轻度遮挡条件下表现稳健，但在极端退化条件下性能明显下降。综上所述，该文所提出的骨架先验增强结构感知网络，通过显式融合骨架先验与隐式对比学习引导，解决了现有方法对结构信息利用不足的缺陷；设计的MVLAAD模块通过动态生成个性化聚类中心，实现了对全局描述符判别性的提升。

Sun Wanjie , Zhang Hong , Li Haojie. Skeleton Prior Enhanced Structure-Aware Point Cloud Model Retrieval[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260245.

孙万杰, 张宏, 李豪杰. 骨架先验增强结构感知的点云检索[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260245.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0260245

References

[1] HUi L, YANG H, CHENG M, et al. Pyramid point cloud transformer for large-scale place recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021: 6078-6087.
[2] 李秀玲,李福胜,张树生.一种融合点云深度学习的三维CAD模型局部特征检索方法[J].机械科学与技术,2025,44(12):2161-2173.DOI:10.13433/j.cnki.1003-8728.20230373. Li, X. L., Li, F. S., & Zhang, S. S. (2025). A local feature retrieval method for 3D CAD models based on point cloud deep learning. Mechanical Science and Technology for Aerospace Engineering, 44(12), 2161–2173. https://doi.org/10.13433/j.cnki.1003-8728.20230373
[3] 王作勋.工业零件3D模型检索系统设计与实现[D].大连：大连理工大学,2024.DOI:10.26991/d.cnki.gdllu.2024.004341. Wang, Z. X. (2024). Design and implementation of a 3D model retrieval system for industrial parts [Master's thesis, Dalian University of Technology]. https://doi.org/10.26991/d.cnki.gdllu.2024.004341
[4] QI C R, LI Y, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space [EB/OL]. (2017) [2024-07-11]. https://arxiv.org/abs/1706.02413.
[5] XU Zongyi, ZHANG Ruicheng, LI Zuo, et al. CF3d: Category fused 3D point cloud retrieval[J]. Signal Processing, 2025, 230: 109805. DOI: 10.1016/j.sigpro.2024.109805.
[6] WANG Y, SUN Y, LIU Z, et al. Dynamic graph CNN for learning on point clouds [EB/OL]. (2019) [2024-07-11]. https://arxiv.org/abs/1801.07829.
[7] LI J, WANG J, XU T. PointGL: a simple global-local framework for efficient point cloud analysis[J]. IEEE Transactions on Multimedia, 2024, 26: 6931-6942. doi:10.1109/TMM.2024.3358695.
[8] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. (2017) [2024-07-11]. https://arxiv.org/abs/1706.03762.
[9] SHAJAHAN D A, VARMA T M, MUTHUGANAPATHY R. Point transformer for shape classification and retrieval of urban roof point clouds[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5. [10] GUO M H, CAI J X, LIU Z N, et al. PCT: point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187-199.
[11] GAO Z, LI Q, SHEN L. DAP-MAE: domain-adaptive point cloud masked autoencoder for effective cross-domain learning [EB/OL]. (2025) [2024-07-11]. https://arxiv.org/abs/2510.21635.
[12] 龙丽叶,焦世超,郭磊,等.基于紧凑中心的多模态三维模型检索研究[J].计算机工程,2025,51(02):322-334.DOI:10.19678/j.issn.1000-3428.0069685. Long, L. Y., Jiao, S. C., Guo, L., et al. (2025). Research on multimodal 3D model retrieval based on compact center. Computer Engineering, 51(2), 322–334. https://doi.org/10.19678/j.issn.1000-3428.0069685
[13] XU Y, FENG Y, ZHANG J, et al. Assembly fuzzy representation on hypergraph for open-set 3D object retrieval[C]//Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, Canada, December 10-15, 2024.
[14] ZHANG R, GUO Z, ZHANG W, et al. PointCLIP: point cloud understanding by CLIP[EB/OL]. (2021) [2024-07-11]. https://arxiv.org/abs/2112.02413.
[15] XUE L, GAO M, XING C, et al. ULIP: learning a unified representation of language, images, and point clouds for 3d understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 1179-1189.
[16] XUE L, YU N, ZHANG S, et al. ULIP-2: towards scalable multimodal pre-training for 3d understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 27091-27101.
[17] FENG Y, JI S, LIU Y S, et al. Hypergraph-based multi-modal representation for open-set 3d object retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(4): 2206-2223.
[18] WEI R, CUI H, LIU Y, et al. Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval [EB/OL]. (2024) [2024-07-11]. https://arxiv.org/abs/2408.05711
[19] LIN C, LI C, LIU Y, et al. Point2Skeleton: learning skeletal representations from point clouds [EB/OL]. (2020) [2024-07-11]. https://arxiv.org/abs/2012.00230.
[20] MARKS E A, NUNES L, MAGISTRI F, et al. Tree Skeletonization from 3D Point Clouds by Denoising Diffusion[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025: 27607-27617.
[21] ONGHENA P, VELASCO-FORERO S, MARCOTEGUI B. MorphoSkEL3D: morphological skeletonization of 3d point clouds for informed sampling in object classification and retrieval [EB/OL]. (2025) [2024-07-11]. https://arxiv.org/abs/2501.12974.
[22] WEN C, YU B, TAO D. Learnable skeleton-aware 3D point cloud sampling[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 17671-17681.
[23] KREUTZ T, MÜHLHÄUSER M, GUINEA A S. DeSPITE: exploring contrastive deep skeleton-pointcloud-IMU-text embeddings for advanced point cloud human activity understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025: 14633-14643.
[24] ARANDJELOVIĆ R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition [EB/OL]. (2016) [2024-07-11]. https://arxiv.org/abs/1511.07247.
[25] UY M A, LEE G H. PointNetVLAD: deep point cloud based retrieval for large-scale place recognition [EB/OL]. (2018) [2024-07-11]. https://arxiv.org/abs/1804.03492.
[26] WU Z, SONG S, KHOSLA A, et al. 3d shapenets: A deep representation for volumetric shapes[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1912-1920.
[27] Angel X., et al. “ShapeNet: An Information-Rich 3D Model Repository.” arXiv preprint arXiv:1512.03012, 2015.
[28] CHARLES R Q, SU H, KAICHUN M, et al. PointNet: deep learning on point sets for 3d classification and segmentation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2017: 77-85.
[29] THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds [EB/OL]. (2019) [2024-07-11]. https://arxiv.org/abs/1904.08889.
[30] ZHAO T, FENG Q, JADHAV S, et al. CORSair:convolutional object retrieval and symmetry-aided registration[C]//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, USA: IEEE Press, 2021: 47-54.
[31] CHEN H, LIU S, CHEN W, et al. Equivariant point network for 3D point cloud analysis[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, USA: IEEE Press, 2021: 14509-14518.
[32] KADAM P, ZHOU Q, LIU S, et al. PCRP: unsupervised point cloud object retrieval and pose estimation[C]//Proceedings of 2022 IEEE International Conference on Image Processing (ICIP). Washington D. C., USA: IEEE Press, 2022: 1596-1600.
[33] SALIHU D, STEINBACH E. SGPCR: spherical Gaussian point cloud representation and its application to object registration and retrieval[C]//Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Washington D. C., USA: IEEE Press, 2023: 572-581.
[34] SHI J, XIAO J, HU X, et al. Enhancing point cloud analysis via neighbor aggregation correction based on cross-stage structure correlation[J]. The Visual Computer, 2025, 41: 11797-11813.
[35] ESTEVES C, ALLEN-BLANCHETTE A, MAKADIA A, et al. Learning SO(3) equivariant representations with spherical CNNs [EB/OL]. (2018) [2024-07-11]. https://arxiv.org/abs/1711.06721.
[36] RAO Y, LU J, ZHOU J. Spherical fractal convolutional neural networks for point cloud recognition[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, USA: IEEE Press, 2019: 452-460.
[37] ZHANG Z, HUA B S, CHEN W, et al. Global context aware convolutions for 3D point cloud understanding[C]//Proceedings of 2020 International Conference on 3D Vision (3DV). Los Alamitos, USA: IEEE Computer Society, 2020: 210-219.
[38] LI X, LI R, CHEN G, et al. A rotation-invariant framework for deep point cloud analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(12): 4503-4514.
[39] YU J, ZHANG C, CAI W. Rethinking rotation invariance with point cloud registration [EB/OL]. (2022) [2024-07-11].https://arxiv.org/abs/2301.00149.
[40] HAMDI A, GIANCOLA S, GHANEM B. Voint cloud: multi-view point cloud representation for 3d understanding[C]//Proceedings of the Eleventh International Conference on Learning Representations (ICLR). [S. l.]: ICLR, 2023.
[41] LIN D, CHENG Y, GUO A, et al. SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval[EB/OL]. (2023)[2024-03-22]. https://arxiv.org/abs/2307.10601.

Please choose a citation manager

Content to export