作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (9): 179-188. doi: 10.19678/j.issn.1000-3428.0068378

• 人工智能与模式识别 • 上一篇    下一篇

基于多级区域选择与跨层特征融合的野生菌分类

李俊仪, 李向阳, 龙朝勋, 李海燕, 李红松, 余鹏飞*()   

  1. 云南大学信息学院, 云南 昆明 650000
  • 收稿日期:2023-09-12 出版日期:2024-09-15 发布日期:2024-03-06
  • 通讯作者: 余鹏飞
  • 基金资助:
    国家自然科学基金(62066046)

Wild Mushroom Classification Based on Multi-level Region Selection and Cross-layer Feature Fusion

LI Junyi, LI Xiangyang, LONG Chaoxun, LI Haiyan, LI Hongsong, YU Pengfei*()   

  1. School of Information, Yunnan University, Kunming 650000, Yunnan, China
  • Received:2023-09-12 Online:2024-09-15 Published:2024-03-06
  • Contact: YU Pengfei

摘要:

近年来误食有毒野生菌导致的中毒事件频发, 严重危害人们的身体健康, 这使得准确鉴别野生菌变得尤为重要。然而, 现有的野生菌分类算法在处理背景噪声大、类间差异小和类内差异大的图片时容易出现识别错误的问题。为了解决这一问题, 提出一种基于Vision Transformer(ViT)架构结合多级区域选择和跨层特征融合的野生菌分类算法。该算法旨在捕获具有强鉴别力的特征, 以确保网络能够聚焦在主要信息上, 并提高分类的准确性。首先采用ViT作为网络框架, 以提取野生菌图像的特征和全局上下文信息。其次设计多头自注意力选择模块, 用于提取具有鉴别力的token, 并通过自适应分配算法为不同层级的编码层确定抽取token的数量。最后为进一步提升分类性能, 引入跨层特征融合策略和标签平滑损失进行拟合训练, 从而减少细节信息的丢失。为使网络对野生菌图像特征的学习更具针对性, 自建野生菌数据集。实验结果表明, 所提出的算法与基线算法相比, 分类精度有了显著提高, 准确率达到98.65%。

关键词: 图像分类, Vision Transformer架构, 特征选择, 自适应分配, 特征融合, 标签平滑

Abstract:

Recent years have witnessed an increasing incidence of accidental consumption of poisonous wild mushrooms, making accurate identification of wild mushrooms particularly important. However, the existing wild mushroom classification algorithms tend to produce recognition errors when dealing with images with high background noise, small interclass differences, or large intraclass differences. To address this problem, this paper proposes a wild mushroom classification algorithm based on the Vision Transformer(ViT) architecture, combined with multilevel region selection and cross-layer feature fusion. The algorithm aims to capture discriminative features to ensure that the network focuses on the essential information and improves classification accuracy. The algorithm uses ViT as a network framework to extract features and global contextual information from wild mushroom images. In addition, it employs a multihead self-attention selection module designed to extract discriminative tokens and utilizes an adaptive allocation algorithm to determine the number of extracted tokens for different levels of coding layers. Finally, the algorithm utilizes a cross-layer feature fusion strategy and label smoothing loss to fine-tune the training parameters, thereby reducing the loss of detailed information and ultimately improving classification performance. For a more targeted learning of wild mushroom image features, a wild mushroom dataset was constructed. The experimental results show a significant improvement in classification accuracy compared with baseline algorithms, with an accuracy of 98.65%.

Key words: image classification, Vision Transformer(ViT)architecture, feature selection, adaptive allocation, feature fusion, label smoothing