MF-cache: CLIP-Based Multimodal Cache Model for Maize Disease Recognition

doi:10.19678/j.issn.1000-3428.0252659

Abstract

Abstract:

Maize is a vital economic crop that is widely used in industries, animal husbandry, and grain-oil processing. Timely identification of maize diseases is crucial for ensuring a stable yield. Currently, deep learning methods such as Convolutional Neural Networks (CNNs) have been widely applied to disease recognition. However, most existing methods rely solely on image information, overlooking the features of other modalities. Moreover, their large parameter sizes and high deployment costs hinder their practical applications. To address these challenges, we propose a lightweight image-text multimodal cache model, MF-cache, that contains only 61 000 parameters, ensuring both low computational cost and high recognition accuracy. The model leverages the multimodal pre-trained model CLIP to extract image and text features, which are fused in parallel to form a key-value cache structure enriched with domain knowledge. Additionally, a weighted two-stage fusion mechanism is introduced to dynamically adjust the contribution of each modality to the classification outcome, thereby enhancing both stability and interpretability. To improve robustness, various data augmentation strategies have been employed to increase sample diversity and mitigate overfitting in low-data scenarios. Experimental results on a self-constructed dataset, CornI&T, and the public PlantVillage dataset demonstrate the effectiveness of the proposed method, achieving 99.72% and 98.80% accuracy, respectively. These results indicate that the method achieves an excellent recognition performance while maintaining a low computational overhead, thus offering an efficient and practical solution for crop disease detection. Furthermore, it highlights the potential of combining multimodal pretrained models with few-shot learning in intelligent agricultural applications.

Key words: maize disease recognition, multimodal cache, pre-trained model, CLIP model, few-shot

摘要：

玉米是重要的经济作物, 广泛应用于工业、畜牧业及粮油加工等领域, 病害的及时识别对保障产量具有重要意义。当前, 卷积神经网络(CNN)等深度学习方法已广泛应用于病害识别, 但多数方法仅依赖图像信息, 忽略其他模态特征, 且模型参数规模较大, 部署成本较高, 限制了实际应用。为解决上述问题, 提出一种基于图像-文本多模态的轻量级缓存模型MF-cache, 模型参数量仅为61 000个, 兼具低计算开销与较高识别精度。该模型借助多模态预训练模型CLIP提取图像与文本特征, 通过并行融合策略获取融合特征, 用于构建含领域知识的可学习key-value缓存结构。此外, 采用加权的两阶段融合机制, 用于动态调整不同模态对分类结果的贡献比例, 提高分类稳定性与合理性。为增强鲁棒性, 引入多种数据增强策略, 提升样本多样性, 缓解小样本带来的过拟合问题。在自建数据集CornI&T与公开数据集PlantVillage上的实验结果表明, 该方法准确率分别达到99.72%与98.80%, 具备良好的泛化性能。所提方法在保持低计算开销的同时, 具备良好的识别性能, 为作物病害检测提供了一种高效可行的解决方案, 并展示了多模态预训练模型与小样本学习在农业智能识别领域的应用潜力。

关键词: 玉米病害识别, 多模态缓存, 预训练模型, CLIP模型, 小样本

SUN Wei, CHEN Junjie. MF-cache: CLIP-Based Multimodal Cache Model for Maize Disease Recognition[J]. Computer Engineering, 2026, 52(3): 420-428.

孙伟, 陈俊杰. MF-cache: 用于玉米病害识别的CLIP多模态缓存模型[J]. 计算机工程, 2026, 52(3): 420-428.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252659

https://www.ecice06.com/EN/Y2026/V52/I3/420

Figures/Tables 10

Fig.1 The structure of MF-cache model

Fig.2 Comparison between multimodal cache cacheModel and visual single modal cache cacheModel_vision

Fig.3 Comparison of sample distributions before and after training

Fig.4 The confusion matrix of the experimental results of MF-cache on the maize subset of plantVillage

References 30

1	United Naations. Growing at a slower pace, world population is expected to re ach 9.7 billion in 2050 and could peak at nearly 11 billion around 2100: UN Rep ort[EB/OL]. [2025-05-17]. https://population.un.org/wpp/Publications/.
2	曹炎, 杨艳涛, 王国刚. 中国玉米自给率时空格局变化驱动因素及区域异质性分析. 玉米科学, 2024, 32 (7): 118- 126.
	CAO Y , YANG Y T , WANG G G . Analysis of driving factors and regional heterogeneity in the spatiotemporal changes of maize self-sufficiency rate in China. Journal of Maize Sciences, 2024, 32 (7): 118- 126.
3	张成鹏, 涂圣伟, 王恒, 等. 中国玉米产业发展现状、未来趋势及政策建议. 中国经济报告, 2025 (Z1): 64- 72.
	ZHANG C P , TU S W , WANG H , et al. Current status, future trends, and policy recommendations for China's maize industry development. China Economic Report, 2025 (Z1): 64- 72.
4	MA Z , WANG W , CHEN X , et al. Prediction of the global occurrence of maize diseases and estimation of yield loss under climate change. Pest Management Science, 2024, 80 (11): 5759- 5770. doi: 10.1002/ps.8309
5	SINGLA A , NEHRA A , JOSHI K , et al. Exploration of machine learning approaches for automated crop disease detection. Current Plant Biology, 2024, 40, 100382. doi: 10.1016/j.cpb.2024.100382
6	DECHANT C , WIESNER-HANKS T , CHEN S , et al. Automated identification of northern leaf blight-infected maize plants from field imagery using deep learning. Phytopathology, 2017, 107 (11): 1426- 1432. doi: 10.1094/PHYTO-11-16-0417-R
7	HASSAN S M , JASINSKI M , LEONOWICZ Z , et al. Plant disease identification using shallow convolutional neural network. Agronomy, 2021, 11 (12): 2388. doi: 10.3390/agronomy11122388
8	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60 (6): 84- 90. doi: 10.1145/3065386
9	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2025-05-17]. https://arxiv.org/abs/1409.1556.
10	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 770-778.
11	AMIN H , DARWISH A , HASSANIEN A E , et al. End-to-end deep learning model for corn leaf disease classification. IEEE Access, 2022, 10, 31103- 31115. doi: 10.1109/ACCESS.2022.3159678
12	PAUL H , UDAYANGANI H , UMESHA K , et al. Maize leaf disease detection using convolutional neural network: a mobile application based on pre-trained VGG16 architecture. New Zealand Journal of Crop and Horticultural Science, 2025, 53 (2): 367- 383. doi: 10.1080/01140671.2024.2385813
13	SUBRAMANIAN M , SHANMUGAVADIVEL K , NANDHINI P S . On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves. Neural Computing and Applications, 2022, 34 (16): 13951- 13968. doi: 10.1007/s00521-022-07246-w
14	RANI R , SAHOO J , BELLAMKONDA S , et al. Attention-enhanced corn disease diagnosis using few-shot learning and VGG16. MethodsX, 2025, 14, 103172. doi: 10.1016/j.mex.2025.103172
15	LEE H , PARK Y S , YANG S , et al. A deep learning-based crop disease diagnosis method using multimodal mixup augmentation. Applied Sciences, 2024, 14 (10): 4322. doi: 10.3390/app14104322
16	CAO Y Y , CHEN L , YUAN Y , et al. Cucumber disease recognition with small samples using image-text-label-based multi-modal language model. Computers and Electronics in Agriculture, 2023, 211, 107993. doi: 10.1016/j.compag.2023.107993
17	ZHANG N , WU H R , ZHU H J , et al. Tomato disease classification and identification method based on multimodal fusion deep learning. Agriculture, 2022, 12 (12): 2014. doi: 10.3390/agriculture12122014
18	ZHOU H C , LI W X , LI P , et al. A novel few-shot learning framework based on diffusion models for high-accuracy sunflower disease detection and classification. Plants, 2025, 14 (3): 339. doi: 10.3390/plants14030339
19	REZAEI M , DIEPEVEEN D , LAGA H , et al. Plant disease recognition in a low data scenario using few-shot learning. Computers and Electronics in Agriculture, 2024, 219, 108812. doi: 10.1016/j.compag.2024.108812
20	RANI R, SAHOO J, BELLAMKONDA S. Corn disease detection using few-shot learning prototypical network[C]//Proceedings of the 16th IEEE International Conference on Computational Intelligence and Communication Networks. Washington D. C., USA: IEEE Press, 2024: 1379-1383.
21	RADFORD A, KIM J W, HALLACY C, et al. Learning tranferable visual models from natural language supervision[C]//Proceedings of the IEEE International Conference on Machine Learning. Washington D. C., USA: IEEE Press, 2021: 8748-8763.
22	ZHANG R R, ZHANG W, FANG R Y, et al. Tip-adapter: training-free adaption of CLIP for few-shot classification[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 493-510.
23	HE Y , ZHANG G , GAO Q . A novel ensemble learning method for crop leaf disease recognition. Frontiers in Plant Science, 2023, 14, 1280671.
24	JI Z J , BAO S D , CHEN M , et al. ICS-ResNet: a lightweight network for maize leaf disease classification. Agronomy, 2024, 14 (7): 1587. doi: 10.3390/agronomy14071587
25	谢琬, 崔艳荣. 基于改进EfficientNet v2模型的玉米叶片病害识别. 江苏农业科学, 2025 (9): 1002- 1302. doi: 10.15889/j.issn.1002-1302.2025.09.028
	XIE W , CUI Y R . Maize leaf disease identification based on improved EfficientNet v2 model. Jiangsu Agricultural Sciences, 2025 (9): 1002- 1302. doi: 10.15889/j.issn.1002-1302.2025.09.028
26	张澳雪, 崔艳荣, 李素若, 陈华锋, 等. 基于改进RegNet网络的玉米叶片病害识别研究[J]. 江苏农业科学, 2024(11): 1210-1221.
	ZHANG A X, CUI Y R, LI S R, et al. Research on maize leaf disease identification based on improved RegNet network[J]. Jiangsu Agricultural Sciences, 2024(11): 1210-1221. (in Chinese)
27	HUGHES D P, SALATHÉ M. An open access repository of images on plant health to enable the detection of plant diseases[EB/OL]. [2025-05-17]. https://arxiv.org/pdf/1511.08060.pdf.
28	王晓鸣, 王振营. 中国玉米病虫草害图鉴. 北京: 中国农业出版社, 2018.
	WANG X M , WANG Z Y . Atlas of maize diseases, pests and weeds in China. Beijing: China Agriculture Press, 2018.
29	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2025-05-17]. https://arxiv.org/pdf/2010.11929.pdf.
30	KAGGLE N D. Plant village dataset[EB/OL]. [2021-07-09]. https://www.kaggle.com/saroz014/plant-diseases.

[1]	XIE Binhong, SHI Yufei, ZHANG Rui, ZHANG Yingjun. Few-shot Object Detection Method Based on Query Guidance and Semantic Enhancement [J]. Computer Engineering, 2026, 52(3): 141-151.
[2]	WANG Renshuai, YANG Kuiwu, CHEN Yue, WANG Wen, WEI Jianghong. Survey of Deep Learning Backdoor Attack on Image Data [J]. Computer Engineering, 2026, 52(3): 62-78.
[3]	ZHANG Zhi, YIN Yukai, SUN Yiling, MENG Wenjing, PENG Chang. Research on Android Malware Detection Model Based on Multi-modal Feature Fusion [J]. Computer Engineering, 2026, 52(3): 243-254.
[4]	SONG Chaoqi, LIU Ying, HE Jinglu, LI Daxiang. Few-shot Image Classification Method Based on Salient Position Interaction Transformer [J]. Computer Engineering, 2026, 52(2): 167-176.
[5]	WEN Lang, GOU Guanglei, BAI Ruifeng, MIAO Wanyu. Few-shot Fine-grained Image Classification Based on Neighborhood Fusion and Feature Enhancement [J]. Computer Engineering, 2026, 52(2): 158-166.
[6]	SUN Yuan, WANG Kangping, ZHAO Mingbo. Clothing Retrieval Based on Multiple Prompts and Contrastive Image-Text Learning [J]. Computer Engineering, 2026, 52(2): 322-330.
[7]	AI Chuanxian, GUO Junjun, YIN Zhaoliang. Method for Event Aspect Category Detection in Few Shot Scenarios via Hierarchical Soft Prompt Interaction Fusion [J]. Computer Engineering, 2025, 51(9): 120-128.
[8]	LI Xiaoyu, LUO Na. Few-Shot Learning Method with Augmentation Data Based on Transferring Intra-Class Variations [J]. Computer Engineering, 2025, 51(9): 242-251.
[9]	LU Xuan, JING Luqi, PENG Furong. Colorectal Polyp Segmentation Method Based on Incremental Learning [J]. Computer Engineering, 2025, 51(7): 284-293.
[10]	LIU Wenjie, CHEN Liang, REN Zhijie. Few-shot Relation Reasoning Model Based on Graph Neural Network and Meta-Learning [J]. Computer Engineering, 2025, 51(5): 124-132.
[11]	ZHANG Heping, FANG Zhijun, LU Junxin, GAO Yongbin. Few-Shot Relation Classification Based on Knowledge-Enhanced Adaptive Prototype Networks [J]. Computer Engineering, 2025, 51(4): 129-136.
[12]	WANG Qingfeng, LI Xu, YAO Chunlong, CHENG Tengteng. Chinese Text-to-SQL Model for Postgraduate Admissions Consultation [J]. Computer Engineering, 2025, 51(3): 362-368.
[13]	WANG Yuehao, ZHOU Ruohua. Review of Research on Keyword Spotting in Low-Resource Environments [J]. Computer Engineering, 2025, 51(2): 35-53.
[14]	LIU Hai, SHI Fobo, ZHANG Zhaoli, HE Jiawen, LI Jiahao. Knowledge Graph Reasoning Based on Textual and Multi-perspective Local Structural Features [J]. Computer Engineering, 2025, 51(11): 80-89.
[15]	BI Ran, YANG Fengyi, ZHOU Xi, YANG Yating, Abibulla Atawulla. Few-Shot Joint Recognition Method of Intent and Slot Based on Cloze [J]. Computer Engineering, 2025, 51(10): 79-86.

Please choose a citation manager

Content to export