作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (7): 140-151. doi: 10.19678/j.issn.1000-3428.0069395

• 人工智能与模式识别 • 上一篇    下一篇

基于视觉大模型的垃圾分类轻量化算法研究

张玉博, 杨帆*(), 郭亚, 杨文慧   

  1. 河北工业大学电子信息工程学院, 天津 300401
  • 收稿日期:2024-02-21 出版日期:2025-07-15 发布日期:2024-06-13
  • 通讯作者: 杨帆
  • 基金资助:
    石家庄市科技合作专项重大项目(SJZZXA23005)

Research on Lightweight Algorithm for Garbage Classification Based on Visual Large Model

ZHANG Yubo, YANG Fan*(), GUO Ya, YANG Wenhui   

  1. School of Electronic Information Engineering, Hebei University of Technology, Tianjin 300401, China
  • Received:2024-02-21 Online:2025-07-15 Published:2024-06-13
  • Contact: YANG Fan

摘要:

随着深度学习技术的快速发展, 其在垃圾分类领域的应用日益广泛, 显著提高了分类的准确性和效率。然而, 在实际应用中仍面临许多挑战, 如数据获取和标注成本高、模型泛化能力不足、实时性要求难以满足等。为此, 基于PP-LCNet提出一种结合视觉大模型的轻量化垃圾分类算法LSM-PPLCNet。LSM-PPLCNet结合视觉大模型的强大特征提取能力与轻量化模型的设计, 确保模型在满足实时性要求的同时, 在自制垃圾分类数据集上提高精度。首先, 使用基于CLIP大模型的半监督训练策略, 对无标注数据进行数据挖掘, 以丰富训练样本, 进而降低人工标注的成本; 其次, 使用知识蒸馏方法, 由高精度CLIP大模型作为教师模型指导轻量化网络PP-LCNet的训练; 最后, 提出基于大模型的权重损失, 通过为不同图片分配损失函数中的占比, 使模型能够根据图片的不同质量调整其在损失函数中的比例。在自制垃圾分类数据集上的实验结果表明, 与基线PP-LCNet分类模型相比, LSM-PPLCNet在不影响推理速度的前提下, Top-1 Accuracy可提升4.03百分点, 与其他主流模型相比也有显著优势, LSM-PPLCNet在垃圾分类任务中可实现精度与速度的平衡。

关键词: 垃圾分类, 视觉大模型, 权重损失, 半监督, 知识蒸馏

Abstract:

As deep learning technology progresses rapidly, it is being increasingly applied in garbage classification, thereby significantly improving classification accuracy and efficiency. However, practical application is hindered by many challenges, such as high data acquisition and annotation costs, insufficient model generalizability, and difficulty in meeting real-time requirements. To address these issues, this paper proposes LSM-PPLCNet, a lightweight garbage classification algorithm combining a large visual model with PP-LCNet. LSM-PPLCNet combines the powerful feature extraction capabilities of large visual models with the design of lightweight models, ensuring that the model meets real-time requirements while achieving improved accuracy on a self-made garbage classification dataset. First, a semi-supervised training strategy based on the CLIP large model is used for data mining on unlabeled data to enrich the training samples and reduce the cost of manual annotation. Second, the knowledge distillation method is used, with the high-precision CLIP large model serving as the teacher model to guide the training of the lightweight network. Finally, the loss function is optimized, and a weighted loss based on the large model is proposed. By assigning different proportions of the loss function to different images, the model can adjust the proportions in the loss function according to the different qualities of the images. After rigorous training and testing on a self-made garbage classification dataset, experimental results show that compared with the original PP-LCNet classification model, LSM-PPLCNet improves the Top-1 Accuracy by 4.03 percentage points without affecting the inference speed and has significant advantages compared with other mainstream models. These results show that LSM-PPLCNet can achieve high-precision and high-speed classification performance in garbage classification tasks.

Key words: garbage classification, visual large model, weight loss, semi-supervised, knowledge distillation