MF-cache:用于玉米病害识别的CLIP多模态缓存模型

doi:10.19678/j.issn.1000-3428.0252659

摘要/Abstract

摘要： 玉米是重要的经济作物，广泛应用于工业、畜牧业及粮油加工等领域，病害的及时识别对保障产量具有重要意义。当前，卷积神经网络（CNN）等深度学习方法已广泛应用于病害识别，但多数方法仅依赖图像信息，忽略其他模态特征，且模型参数规模较大，部署成本较高，限制了实际应用。为解决上述问题，提出一种基于图像-文本多模态的轻量级缓存模型MF-cache，模型参数量仅为0.061M，兼具低计算开销与较高识别精度。该模型借助多模态预训练模型CLIP提取图像与文本特征，通过并行融合策略获取融合特征，用于构建含领域知识的可学习key-value缓存结构。此外，采用加权的两阶段融合机制，用于动态调整不同模态对分类结果的贡献比例，提高分类稳定性与合理性。为增强鲁棒性，还引入多种数据增强策略，提升样本多样性，缓解小样本带来的过拟合问题。在自建数据集CornI&T与公开数据集PlantVillage上的实验结果显示，该方法分别达到99.72%与98.80%的准确率，具备良好的泛化性能。结果表明，所提方法在保持低计算开销的同时，具备良好的识别性能，为作物病害检测提供了一种高效可行的解决方案，并展示了多模态预训练模型与小样本学习在农业智能识别领域的应用潜力。

Abstract: Maize is a vital economic crop, widely used in industry, animal husbandry, and grain-oil processing. Timely identification of maize diseases is crucial for ensuring stable yield. Currently, deep learning methods such as Convolutional Neural Networks (CNNs) have been widely applied to disease recognition. However, most existing methods rely solely on image information, overlooking features from other modalities. Moreover, their large parameter sizes and high deployment costs hinder practical applications. To address these challenges, we propose a lightweight image-text multimodal cache model, MF-cache, which contains only 0.061M parameters, achieving both low computational cost and high recognition accuracy. The model leverages the multimodal pre-trained model CLIP to extract image and text features, which are fused in parallel to form a key-value cache structure enriched with domain knowledge. Additionally, a weighted two-stage fusion mechanism is introduced to dynamically adjust the contribution of each modality to the classification outcome, enhancing both stability and interpretability. To improve robustness, various data augmentation strategies are employed to increase sample diversity and mitigate overfitting in low-data scenarios. Experimental results on a self-constructed dataset CornI&T and the public PlantVillage dataset demonstrate the effectiveness of the proposed method, achieving 99.72% and 98.80% accuracy, respectively. These results indicate that the method achieves excellent recognition performance while maintaining low computational overhead, offering an efficient and practical solution for crop disease detection. Furthermore, it highlights the potential of combining multimodal pre-trained models with few-shot learning in intelligent agricultural applications.

孙伟, 陈俊杰. MF-cache:用于玉米病害识别的CLIP多模态缓存模型[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252659.

SUN Wei, CHEN Jun Jie. MF-cache: A CLIP-Based Multimodal Cache Model for Maize Disease Recognition[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252659.

参考文献

[1]United Naations,Growing at a slower pace, world population is expected to re ach 9.7 billion in 2050 and could peak at nearly 11 billion around 2100: UN Rep ort[EB/OL].2024-06-17https://population.un.org/wpp/Publications/
[2]曹炎,杨艳涛,王国刚.中国玉米自给率时空格局变化驱动因素及区域异质性分析[J].玉米科学, 2024, 32(7):118-126. Cao Yan, Yang Yantao, Wang Guogang. Analysis of driving factors and regional heterogeneity of spatiotemporal changes in China's maize self-sufficiency rate [J]. Maize Science, 2024, 32(7): 118-126.
[3]张成鹏,涂圣伟,王恒,等.中国玉米产业发展现状、未来趋势及政策建议[J].中国经济报告,2025,(Z1):64-72. Zhang Chengpeng, Tu Shengwei, Wang Heng, et al. Current status, future trends, and policy recommendations for China’s maize industry development [J]. China Economic Report, 2025, (Z1): 64-72.
[4] Ma Z, Wang W, Chen X, et al. Prediction of the global occurrence of maize diseases and estimation of yield loss under climate change[J]. Pest Management Science, 2024, 80(11): 5759-5770.
[5]Singla A ,Nehra A ,Joshi K , et al.Exploration of machine learning approaches for automated crop disease detection[J].Current Plant Biology,2024,40100382-100382.
[6]Chad D ,Tyr W ,Siyuan C , et al.Automated Identification of Northern Leaf Blight-Infected Maize Plants from Field Imagery Using Deep Learning.[J].Phytopathology,2017,107(11):1426-1432.
[7]Hassan S M, Jasinski M, Leonowicz Z, et al. Plant disease identification using shallow convolutional neural network[J]. Agronomy, 2021, 11(12): 2388.
[8]Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.v [9]Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[10]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[11]Amin H, Darwish A, Hassanien A E, et al. End-to-end deep learning model for corn leaf disease classification[J]. IEEE Access, 2022, 10: 31103-31115.
[12]Paul H, Udayangani H, Umesha K, et al. Maize leaf disease detection using convolutional neural network: a mobile application based on pre-trained VGG16 architecture[J]. New Zealand Journal of Crop and Horticultural Science, 2025, 53(2): 367-383.
[13] Subramanian M, Shanmugavadivel K, Nandhini P S. On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves[J]. Neural Computing and Applications, 2022, 34(16): 13951-13968.
[14] Rani R, Sahoo J, Bellamkonda S, et al. Attention-enhanced corn disease diagnosis using few-shot learning and VGG16[J]. MethodsX, 2025, 14: 103172.
[15] Lee H, Park Y S, Yang S, et al. A Deep Learning-Based Crop Disease Diagnosis Method Using Multimodal Mixup Augmentation[J]. Applied Sciences, 2024, 14(10): 4322.
[16] Cao Y, Chen L, Yuan Y, et al. Cucumber disease recgnition with small samples using image-text-label-based multi-modal language model[J]. Computers and electronics in agriculture, 2023, 211: 107993.
[17] Zhang N, Wu H, Zhu H, et al. Tomato disease classification and identification method based on multimodal fusion deep learning[J]. Agriculture, 2022, 12(12): 2014.
[18] Zhou H, Li W, Li P, et al. A novel few-shot learning framework based on diffusion models for high-accuracy sunflower disease detection and classification[J]. Plants, 2025, 14(3): 339.
[19] Rezaei M, Diepeveen D, Laga H, et al. Plant disease recognition in a low data scenario using few-shot learning[J]. Computers and electronics in agriculture, 2024, 219: 108812.
[20] Rani R, Sahoo J, Bellamkonda S. Corn Disease Detection Using Few-Shot Learning Prototypical Network[C]//2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN). IEEE, 2024: 1379-1383.
[21] Radford A, Kim J W, Hallacy C, et al. Learning tranferable visual models from natural language supervision[C]//International conference on machine learning. PmLR, 2021: 8748-8763.
[22] Zhang R, Zhang W, Fang R, et al. Tip-adapter: Training-free adaption of clip for few-shot classification[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 493-510.
[23] He Y, Zhang G, Gao Q. A novel ensemble learning method for crop leaf disease recognition[J]. Frontiers in Plant Science, 2024, 14: 1280671.
[24] Ji Z, Bao S, Chen M, et al. ICS-ResNet: a lightweight network for maize leaf disease classification[J]. Agronomy, 2024, 14(7): 1587.
[25] 谢琬,崔艳荣.基于改进EfficientNet v2模型的玉米叶片病害识别[J].江苏农业科学, 2025,(9):1002-1302. Xie Wan, Cui Yanrong. Maize leaf disease identification based on improved EfficientNet v2 model [J]. Jiangsu Agricultural Sciences, 2025, (9): 1002–1302.
[26] 张澳雪崔艳荣李素若陈华锋胡玉荣胡蓉华.基于改进RegNet网络的玉米叶片病害识别研究[J].江苏农业科学, 2024(11). Zhang Aoxue, Cui Yanrong, Li Suruo, Chen Huafeng, Hu Yurong, Hu Ronghua. Research on maize leaf disease identification based on improved RegNet network [J]. Jiangsu Agricultural Sciences, 2024, (11).
[27] HUGHES D P, SALATHÉ M. An open access repository of images on plant health to enable the detection of plant diseases [DS/OL]. arXiv,2015.https://doi.org https://doi.org/10.5281/zenodo.3780461
[28]王晓鸣, 王振营. 中国玉米病虫草害图鉴 [M]. 北京: 中国农业出版社, 2018. Wang Xiaoming, Wang Zhenying. Illustrated Handbookof Maize Diseases, Insect Pests, and Weeds in China [M]. Beijing: China Agriculture Press, 2018.
[29] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arxiv preprint arxiv:2010.11929, 2020.
[30]Plant Village Dataset [DS/OL]. Kaggle, n.d. [2021-07-09]. https://www.kaggle.com/saroz014/plant-diseases

选择文件类型/文献管理软件名称

选择包含的内容