基于视觉大模型的垃圾分类轻量化算法研究

doi:10.19678/j.issn.1000-3428.0069395

摘要/Abstract

摘要：

随着深度学习技术的快速发展, 其在垃圾分类领域的应用日益广泛, 显著提高了分类的准确性和效率。然而, 在实际应用中仍面临许多挑战, 如数据获取和标注成本高、模型泛化能力不足、实时性要求难以满足等。为此, 基于PP-LCNet提出一种结合视觉大模型的轻量化垃圾分类算法LSM-PPLCNet。LSM-PPLCNet结合视觉大模型的强大特征提取能力与轻量化模型的设计, 确保模型在满足实时性要求的同时, 在自制垃圾分类数据集上提高精度。首先, 使用基于CLIP大模型的半监督训练策略, 对无标注数据进行数据挖掘, 以丰富训练样本, 进而降低人工标注的成本; 其次, 使用知识蒸馏方法, 由高精度CLIP大模型作为教师模型指导轻量化网络PP-LCNet的训练; 最后, 提出基于大模型的权重损失, 通过为不同图片分配损失函数中的占比, 使模型能够根据图片的不同质量调整其在损失函数中的比例。在自制垃圾分类数据集上的实验结果表明, 与基线PP-LCNet分类模型相比, LSM-PPLCNet在不影响推理速度的前提下, Top-1 Accuracy可提升4.03百分点, 与其他主流模型相比也有显著优势, LSM-PPLCNet在垃圾分类任务中可实现精度与速度的平衡。

关键词: 垃圾分类, 视觉大模型, 权重损失, 半监督, 知识蒸馏

Abstract:

As deep learning technology progresses rapidly, it is being increasingly applied in garbage classification, thereby significantly improving classification accuracy and efficiency. However, practical application is hindered by many challenges, such as high data acquisition and annotation costs, insufficient model generalizability, and difficulty in meeting real-time requirements. To address these issues, this paper proposes LSM-PPLCNet, a lightweight garbage classification algorithm combining a large visual model with PP-LCNet. LSM-PPLCNet combines the powerful feature extraction capabilities of large visual models with the design of lightweight models, ensuring that the model meets real-time requirements while achieving improved accuracy on a self-made garbage classification dataset. First, a semi-supervised training strategy based on the CLIP large model is used for data mining on unlabeled data to enrich the training samples and reduce the cost of manual annotation. Second, the knowledge distillation method is used, with the high-precision CLIP large model serving as the teacher model to guide the training of the lightweight network. Finally, the loss function is optimized, and a weighted loss based on the large model is proposed. By assigning different proportions of the loss function to different images, the model can adjust the proportions in the loss function according to the different qualities of the images. After rigorous training and testing on a self-made garbage classification dataset, experimental results show that compared with the original PP-LCNet classification model, LSM-PPLCNet improves the Top-1 Accuracy by 4.03 percentage points without affecting the inference speed and has significant advantages compared with other mainstream models. These results show that LSM-PPLCNet can achieve high-precision and high-speed classification performance in garbage classification tasks.

Key words: garbage classification, visual large model, weight loss, semi-supervised, knowledge distillation

张玉博, 杨帆, 郭亚, 杨文慧. 基于视觉大模型的垃圾分类轻量化算法研究[J]. 计算机工程, 2025, 51(7): 140-151.

ZHANG Yubo, YANG Fan, GUO Ya, YANG Wenhui. Research on Lightweight Algorithm for Garbage Classification Based on Visual Large Model[J]. Computer Engineering, 2025, 51(7): 140-151.

https://www.ecice06.com/CN/Y2025/V51/I7/140

图/表 17

图1 TrashNet数据集示例图片

Fig.1 Example images of TrashNet dataset

图2 LSM-PPLCNet算法流程

Fig.2 Procedure of LSM-PPLCNet algorithm

图3 CLIP模型结构

Fig.3 The structure of CLIP model

图4 适配垃圾分类任务的CLIP图像编码器

Fig.4 CLIP image encoder adapted for garbage classification tasks

图5 基于视觉大模型的半监督训练流程

Fig.5 Semi-supervised training procedure based on visual large model

图6 PP-LCNet网络结构

Fig.6 Structure of PP-LCNet network

图7 基于CLIP的知识蒸馏训练流程

Fig.7 Training procedure of knowledge distillation based on CLIP

图8 自制垃圾分类数据示例图片

Fig.8 Example images of self-made garbage classification data

图9 Garbage Classification数据集示例图片

Fig.9 Example images of Garbage Classification dataset

图10 Waste Classification数据集示例图片

Fig.10 Example images of Waste Classification dataset

图11 垃圾图片分类结果展示

Fig.11 Results display of garbage image classification

参考文献 26

1	刘峥颢, 安稳飞. 我国城市生活垃圾标准制定实施现状及建议. 中国标准化, 2020 (2): 99- 104.
	LIU Z H, AN W F. The current situation and smggestion on developing and implementing municipal solid waste standards in China. China Standardization, 2020 (2): 99- 104.
2	金山. 洁净城市的第一步是垃圾分类. 防灾博览, 2017 (2): 60- 61.
	JIN S. The first step in a clean city is garbage classification. Overviewof Disaster Prevention, 2017 (2): 60- 61.
3	马永喜, 辛雅儒, 申晨. 人工智能技术应用对城市居民垃圾分类成效的影响——一个实地实验研究. 经营与管理, 2022 (10): 116- 122.
	MA Y X, XIN Y R, SHEN C. The impact of artificial intelligence technology application on the effectiveness of urban residents' garbage classification: a field experimental study. Management and Administration, 2022 (10): 116- 122.
4	高永强, 冯露之, 平安, 等. 生活垃圾自动识别分类系统研究. 数字通信世界, 2022 (3): 122- 124.
	GAO Y Q, FENG L Z, PING A, et al. Research on automatic identification and classification system of domestic waste. Digital Communication World, 2022 (3): 122- 124.
5	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 4510-4520.
6	HOWARD A, SANDLER M, CHU B, et al. Searching for MobileNetV3[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 1314-1324.
7	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 6848-6856.
8	TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[EB/OL]. [2024-01-15]. https://arxiv.org/abs/1905.11946?context=cs.LG.
9	CUI C, GAO T Q, WEI S Y, et al. PP-LCNet: a lightweight CPU convolutional neural network[EB/OL]. [2024-01-15]. https://arxiv.org/abs/2109.15099?context=cs.
10	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2024-01-15]. https://arxiv.org/abs/2103.00020?file=2103.00020.
11	CUI C, GUO R Y, DU Y N, et al. Beyond self-supervision: a simple yet effective network distillation alternative to improve backbones[EB/OL]. [2024-01-15]. https://arxiv.org/abs/2103.05959v1.
12	HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. [2024-01-15]. https://arxiv.org/abs/1503.02531?context=stat.ML.
13	O'SHEA K, NASH R. An introduction to convolutional neural networks[EB/OL]. [2024-01-15]. https://arxiv.org/abs/1511.08458?context=cs.
14	YANG M, THUNG G. Classification of trash for recyclability status[EB/OL]. [2024-01-15]. https://cs229.stanford.edu/proj2016/report/ThungYang-ClassificationOfTrashForRecyclabilityStatus-report.pdf.
15	RABANO S L, CABATUAN M K, SYBINGCO E, et al. Common garbage classification using mobilenet[C]//Proceedings of the 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM). Washington D. C., USA: IEEE Press, 2018: 1-4.
16	RUIZ V, SÁNCHEZ Á, VÉLEZ J F, et al. Automatic image-based waste classification[C]//Proceedings of the 8th International Work-Conference on the Interplay Between Natural and Artificial Computation. Berlin, Germany: Springer, 2019: 422-431.
17	ADEDEJI O, WANG Z H. Intelligent waste classification system using deep learning convolutional neural network. Procedia Manufacturing, 2019, 35, 607- 612.
18	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-01-15]. https://arxiv.org/pdf/1409.1556.
19	ZIOUZIOS D, TSIKTSIRIS D, BARAS N, et al. A distributed architecture for smart recycling using machine learning. Future Internet, 2020, 12 (9): 141. URL
20	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 1-9.
21	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2017: 4700-4708.
22	刘学博, 张民, 龚声蓉. 大模型智能与安全研究综述. 常熟理工学院学报, 2024, 38 (2): 1-6, 11.
	LIU X B, ZHANG M, GONG S R. A comprehensive review of large language models and security intelligence analysis. Journal of Changshu Institute of Technology(Natural Sciences), 2024, 38 (2): 1-6, 11.
23	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
24	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2024-01-15]. https://arxiv.org/abs/2010.11929.
25	HE K M, CHEN X L, XIE S, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2022: 16000-16009.
26	MAO A, MOHRI M, ZHONG Y. Cross-entropy loss functions: theoretical analysis and applications[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: AAAI Press, 2023: 23803-23828.

[1]	卢鹏, 仲闯. 改进CycleGAN的半监督建筑物提取算法[J]. 计算机工程, 2025, 51(3): 241-251.
[2]	高睿, 安国成, 邹丹平, 裴凌. 基于改进YOLOv5的半监督车辆检测算法[J]. 计算机工程, 2025, 51(3): 300-309.
[3]	张新波, 张雪英, 黄丽霞, 陈桂军. 基于半监督深度自编码网络的分类算法及应用[J]. 计算机工程, 2025, 51(1): 71-80.
[4]	林烁彬, 蔡捷仪, 方晓城, 张正, 卢光明, 陈炳志. 基于强度相关正则化学习的对抗鲁棒蒸馏方法[J]. 计算机工程, 2025, 51(1): 42-50.
[5]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[6]	屠乃威, 焦猛, 阎馨. 复杂环境下输电线路鸟巢目标图像检测模型[J]. 计算机工程, 2024, 50(7): 216-226.
[7]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[8]	顾永跟, 高凌轩, 吴小红, 陶杰. 非独立同分布下联邦半监督学习的数据分享研究[J]. 计算机工程, 2024, 50(6): 188-196.
[9]	游奔, 李晓红, 姚锦, 冯绍杰. 基于多粒度图与注意力机制的半监督短文本分类[J]. 计算机工程, 2024, 50(5): 83-90.
[10]	曹坪, 杨怀志, 薄一军, 尤嘉, 张淳杰, 李丹勇. 面向低质量裂缝图像的多知识蒸馏分类[J]. 计算机工程, 2023, 49(7): 204-213.
[11]	陈仲磊, 伊鹏, 陈祥, 胡涛. 基于集成学习的系统调用实时异常检测框架[J]. 计算机工程, 2023, 49(6): 162-169,179.
[12]	毛亮, 赵林均, 余敦辉, 孙斌. 基于知识蒸馏的企业命名实体识别模型[J]. 计算机工程, 2023, 49(5): 90-96.
[13]	古楠楠. 针对标签噪声数据的自步半监督降维[J]. 计算机工程, 2023, 49(11): 131-142.
[14]	詹健浩, 甘利鹏, 毕永辉, 曾鹏, 李晓潮. 基于知识蒸馏的多模态融合行为识别方法[J]. 计算机工程, 2023, 49(10): 280-288, 297.
[15]	王士浩, 王中卿, 李寿山, 周国栋. 基于知识蒸馏与模型集成的事件论元抽取方法[J]. 计算机工程, 2022, 48(7): 97-103.

选择文件类型/文献管理软件名称

选择包含的内容