RGB-D Saliency Object Detection Based on Sparse Contrastive Self-Distillation

doi:10.19678/j.issn.1000-3428.0069648

Abstract

Abstract:

In recent years, Red Green Blue-Depth (RGB-D) saliency object detection technology has made significant progress with a notable improvement in performance. However, the dependency of the RGB-D saliency object detection technology on complex and resource-intensive architectures has limited its application in resource-constrained environments. Although lightweight networks have improved in terms of size and speed, they often come at the cost of sacrificing performance. To address this challenge, an innovative lightweight solution is proposed, which overcomes these limitations by streamlining network parameters and enhancing performance. An effective and universal training strategy and a sparse contrastive self-distillation technology are proposed, in order to compress and accelerate existing RGB-D saliency detection models while improving model performance. This strategy is primarily composed of two key technologies: sparse self-distillation and adversarial contrastive learning. Sparse self-distillation focuses on eliminating unnecessary parameters in saliency detection models while retaining key parameters, thereby achieving more efficient and effective saliency prediction. Adversarial contrastive learning, on the other hand, aims to correct potential errors, further refining the self-distillation process and improving the overall performance of the model. Experimental results on benchmark datasets such as NJUD, NLPR, LFSD, ReDWeb-S, and COME15K demonstrate that, compared to other State of The Art (SOTA) methods, our method can produce more accurate saliency detection results. Furthermore, comparison results of the proposed method with existing SOTA lightweight RGB-D saliency detection models further confirm that our method can achieve a balance between model size reduction and performance enhancement, without sacrificing performance.

Key words: Red Green Blue-Depth (RGB-D) saliency object detection, sparse self-distillation, contrast learning

摘要：

近年来，红绿蓝-深度(RGB-D)显著性目标检测技术取得了巨大进展，性能得到显著提高。然而，该技术依赖于复杂且资源密集的网络架构，无法应用于资源受限环境。虽然，轻量级网络在尺寸和速度上有所改善，但往往以牺牲性能为代价。为了克服上述限制，提出了一种新颖的轻量化解决方案，以实现网络参数的精简和性能的提升。本文提供了一种有效的通用训练策略，提出稀疏对比自蒸馏技术。该技术旨在对现有的RGB-D显著性检测模型进行压缩和加速，同时增强模型性能。本文方法由两个关键技术组成：稀疏自蒸馏和对抗性对比学习。稀疏自蒸馏排除显著性检测模型中的非必要参数，同时保留关键参数，从而实现更高效和有效的显著性预测。而对抗性对比学习通过纠正潜在错误，进一步完善自蒸馏过程，以提高模型的整体性能。在NJUD、NLPR、LFSD、ReDWeb-S和COME15K等基准数据集上的实验结果显示，与现有SOTA(State-of-The-Art)方法相比，本文方法能够产生更为准确的显著性检测结果。此外，本文方法与现有SOTA轻量级RGB-D显著性检测模型的比较结果进一步证实了本文方法在不牺牲性能的前提下能够在模型尺寸减小和性能提升之间实现平衡。

关键词: 红绿蓝-深度(RGB-D)显著性目标检测, 稀疏自蒸馏, 对比学习

YU Yangyang, WU Dunquan, CHEN Chenglizhao. RGB-D Saliency Object Detection Based on Sparse Contrastive Self-Distillation[J]. Computer Engineering, 2025, 51(10): 327-335.

于洋洋, 吴敦全, 陈程立诏. 基于稀疏对比自蒸馏的RGB-D显著性物体检测[J]. 计算机工程, 2025, 51(10): 327-335.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069648

https://www.ecice06.com/EN/Y2025/V51/I10/327

Figures/Tables 10

References 37

1	LIANG J, CAO J, FAN Y, et al. VRT: a video restoration transformer. IEEE Transactions on Image Processing, 2024, 33, 2171- 2182. doi: 10.1109/TIP.2024.3372454
2	SONG M, SONG W, YANG G, et al. Improving RGB-D salient object detection via modality-aware decoder. IEEE Transactions on Image Processing, 2022, 31, 6124- 6138. doi: 10.1109/TIP.2022.3205747
3	ZHANG W, JIANG Y, FU K, et al. BTS-Net: bi-directional transfer-and-selection network for RGB-D salient object detection[C]//Proceedings of 2021 IEEE International Conference on Multimedia and Expo. Washington D. C., USA: IEEE Press, 2021: 1-6.
4	ZHAO X, PANG Y, ZHANG L, et al. Self-supervised pretraining for RGB-D salient object detection[EB/OL]. (2021-01-29)[2024-02-25]. https://arxiv.org/abs/2101.12482.
5	ZHOU T, FU H, CHEN G, et al. Specificity-preserving RGB-D saliency detection[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 4681-4691.
6	WU Y H, LIU Y, XU J, et al. MobileSal: extremely efficient RGB-D salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 10261- 10269. doi: 10.1109/TPAMI.2021.3134684
7	PIAO Y, RONG Z, ZHANG M, et al. A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 9060-9069.
8	ZHANG J, LIANG Q, SHI Y. KD-SCFNet: towards more accurate and efficient salient object detection via knowledge distillation[EB/OL]. (2022-09-21)[2024-02-25]. https://arxiv.org/abs/2208.02178.
9	ZHOU H, QIAO B, YANG L, et al. Texture-guided saliency distilling for unsupervised salient object detection[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 7257-7267.
10	李沼洁, 朱恒亮, 毛国君, 等. 渐进式特征增强的弱监督显著性目标检测. 计算机工程, 2024, 50(12): 233- 244. doi: 10.19678/j.issn.1000-3428.0068406
	LI Z J, ZHU H L, MAO G J, et al. Progressively feature-enhanced weakly supervised for salient object detection. Computer Engineering, 2024, 50(12): 233- 244. doi: 10.19678/j.issn.1000-3428.0068406
11	李军侠, 王星驰, 殷梓, 等. 边缘深度挖掘的弱监督显著性目标检测. 计算机工程, 2023, 49(7): 169- 178. doi: 10.19678/j.issn.1000-3428.0065413
	LI J X, WANG X C, YIN Z, et al. Weakly supervised salient object detection via edge depth mining. Computer Engineering, 2023, 49(7): 169- 178. doi: 10.19678/j.issn.1000-3428.0065413
12	CHEN C, WANG G, PENG C, et al. Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Transactions on Image Processing, 2019, 29, 1090- 1100.
13	SONG H, LIU Z, DU H, et al. Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing, 2017, 26(9): 4204- 4216. doi: 10.1109/TIP.2017.2711277
14	GUO J, REN T, BEI J. Salient object detection for RGB-D image via saliency evolution[C]//Proceedings of 2016 IEEE International Conference on Multimedia and Expo (ICME). Washington D. C., USA: IEEE Press, 2016: 1-6.
15	XU Y, YU X, ZHANG J, et al. Weakly supervised RGB-D salient object detection with prediction consistency training and active scribble boosting. IEEE Transactions on Image Processing, 2022, 31, 2148- 2161. doi: 10.1109/TIP.2022.3151999
16	孙福明, 胡锡航, 武景宇, 等. 跨模态交互融合与全局感知的RGB-D显著性目标检测. 软件学报, 2024, 35(4): 1899- 1913.
	SUN F M, HU X H, WU J Y, et al. RGB-D salient object detection based on cross-modal interactive fusion and global awareness. Journal of Software, 2024, 35(4): 1899- 1913.
17	DONG S, HONG X, TAO X, et al. Few-shot class-incremental learning via relation knowledge distillation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2021, 35(2): 1255-1263.
18	LIN S, JI R, CHEN C, et al. Holistic CNN compression via low-rank decomposition with knowledge transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(12): 2889- 2905.
19	LIU B, RAO Y, LU J, et al. MetaDistiller: network self-boosting via meta-learned top-down distillation[EB/OL]. (2019-01-22)[2024-02-25]. https://arxiv.org/abs/1807.03748.
20	VAN DEN OORD A, LI Y, VINYALS O. Representation learning with contrastive predictive coding[EB/OL]. (2020-08-27)[2024-02-25]. https://arxiv.org/abs/2008.12094.
21	HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 9729-9738.
22	XU G, LIU Z, LI X, et al. Knowledge distillation meets self-supervision[EB/OL]. (2020-06-12)[2024-02-25]. https://arxiv.org/abs/2006.07114.
23	TIAN Y, KRISHNAN D, ISOLA P. Contrastive representation distillation[EB/OL]. (2022-01-24)[2024-02-25]. https://arxiv.org/abs/1910.10699.
24	WU H, QU Y, LIN S, et al. Contrastive learning for compact single image dehazing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 10551-10560.
25	HUANG R, ZHAO Q, XING Y, et al. A saliency enhanced feature fusion based multiscale RGB-D salient object detection network[C]//Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D. C., USA: IEEE Press, 2024: 9356-9360.
26	JU R, GE L, GENG W, et al. Depth saliency based on anisotropic center-surround difference[C]//Proceedings of 2014 IEEE International Cconference on Image Processing (ICIP). Washington D. C., USA: IEEE Press, 2014: 1115-1119.
27	PENG H, LI B, XIONG W, et al. RGBD salient object detection: a benchmark and algorithms[C]//Proceedings of Computer Vision-ECCV 2014, Berlin, Germany: Springer, 2014: 92-109.
28	LI N, YE J, JI Y, et al. Saliency detection on light field. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 39(8): 1605- 1616.
29	ZHANG J, FAN D P, DAI Y, et al. RGB-D saliency detection via cascaded mutual information minimization[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 4338-4347.
30	LIU N, ZHANG N, SHAO L, et al. Learning selective mutual attention and contrast for RGB-D saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(12): 9026- 9042.
31	FAN D P, CHENG M M, LIU Y, et al. Structure-measure: a new way to evaluate foreground maps[EB/OL]. (2017-08-02)[2024-02-25]. https://arxiv.org/abs/1708.00786.
32	MARGOLIN R, ZELNIK-MANOR L, TAL A. How to evaluate foreground maps?[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2014: 248-255.
33	FAN D P, LIN Z, ZHANG Z, et al. Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(5): 2075- 2089.
34	CONG R, LIN Q, ZHANG C, et al. CIR-Net: cross-modality interaction and refinement for RGB-D salient object detection. IEEE Transactions on Image Processing, 2022, 31, 6800- 6815. doi: 10.1109/TIP.2022.3216198
35	WU Z, ALLIBERT G, MERIAUDEAU F, et al. HiDAnet: RGB-D salient object detection via hierarchical depth awareness[EB/OL]. (2023-01-18)[2024-02-25]. https://arxiv.org/abs/2301.07405.
36	WU Z, PAUDEL D P, FAN D P, et al. Source-free depth for object pop-out[EB/OL]. (2023-09-25)[2024-02-25]. https://arxiv.org/abs/2212.05370.
37	CHEN J, KAO S, HE H, et al. Run, don't walk: chasing higher FLOPS for faster neural networks[EB/OL]. (2023-05-21)[2024-02-25]. https://arxiv.org/abs/2303.03667.

Please choose a citation manager

Content to export