作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (12): 254-264. doi: 10.19678/j.issn.1000-3428.0068731

• 图形图像处理 • 上一篇    下一篇

基于多特征融合的食品图像分类

叶志鹏, 姜枫*()   

  1. 南京理工大学泰州科技学院, 江苏 泰州 225300
  • 收稿日期:2023-10-30 出版日期:2024-12-15 发布日期:2024-04-02
  • 通讯作者: 姜枫
  • 基金资助:
    江苏省高等学校自然科学研究面上项目(19KJB520038); 江苏省“333人才工程”项目

Food Image Classification Based on Multi-Feature Fusion

YE Zhipeng, JIANG Feng*()   

  1. Taizhou Institute of Science and Technology, Nanjing University of Science and Technology, Taizhou 225300, Jiangsu, China
  • Received:2023-10-30 Online:2024-12-15 Published:2024-04-02
  • Contact: JIANG Feng

摘要:

随着生活水平的提升, 人们对健康饮食的需求与日俱增, 食品图像识别成为热门研究课题之一。食品加工和烹饪过程的不同造成了同类食品的形状和颜色存在差异, 不同类别的食品也可能会呈现相似的视觉特征, 因此食品图像的识别较一般图像识别难度更大。为了解决上述问题, 提出基于多特征融合的食品图像分类网络MTFNet。首先, 将图像的RGB彩色通道数据与局部二值模式(LBP)对应的纹理特征相融合作为骨干挤压和激励网络(SENet)的输入。接着, 利用细节注意力模块挖掘不同位置上各通道的权重, 进而对各层特征图进行局部增强, 提升特征图局部表征能力。然后, 利用自注意力机制计算特征图各通道之间的自注意力权重, 挖掘特征图间的相关性, 提取图像的全局特征。最后, 将局部增强特征和全局特征拼接融合后进行图像分类。实验结果表明, 在食品图像数据集ETH Food101、ChineseFoodNet和ISIA Food-500上, 与目前最佳的多尺度拼图重构网络(MJR-Net)模型相比, MTFNet模型的Top-1准确率分别提高了0.44、1.01和0.66个百分点, 取得了更好的识别性能。

关键词: 食品图像分类, 局部二值模式, 挤压和激励网络, 细节注意力, 自注意力

Abstract:

With improvements in living standards, the demand for a healthy diet is increasing daily, and the problem of food image recognition has become an important research topic. Owing to the different processing and cooking methods of food, the shape and color of similar food vary, and different types of food may present similar visual characteristics. Hence, the recognition of food images is more challenging than general image recognition. To solve these problems, a multi-feature fusion food image classification network, MTFNet, is proposed. First, the R, G, and B color channel data of the image are fused with the texture features corresponding to the local binary mode as the input of the backbone Squeeze and Excite Network (SENet). A detail attention module is then proposed to mine the weights of each channel at different positions, which can enhance the local information of the feature map of each layer and improve its local representation ability. Subsequently, the self-attention mechanism is applied to calculate the self-attention weights between each channel of the feature map, which can mine the correlation between the feature maps and extract the global features of the image. Finally, the locally enhanced and global features are concatenated and fused to classify the images. The experimental results indicate that the Top-1 accuracy of the MTFNet model is improved by 0.44, 1.01, and 0.66 percentage points on the ETH Food101, ChineseFoodNet, and ISIA Food-500 food image datasets, respectively, as compared with Multi-scale Jigsaw Reconstruction Network (MJR-Net), achieving the best recognition performance.

Key words: food image classification, Local Binary Pattern (LBP), Squeeze and Excite Network (SENet), detail attention, self-attention