Changqing Yan , Zeyun Liang , Han Cheng , Shuyang Li , Guangpeng Yang , Zhiwei Li , Ling Yin , Junjie Qu , Jing Wang , Genghong Wu , Qi Tian , Qiang Yu , Gang Zhao
{"title":"CDIP-ChatGLM3:一种集成计算机视觉和语言建模的作物病害识别和处方双模型方法","authors":"Changqing Yan , Zeyun Liang , Han Cheng , Shuyang Li , Guangpeng Yang , Zhiwei Li , Ling Yin , Junjie Qu , Jing Wang , Genghong Wu , Qi Tian , Qiang Yu , Gang Zhao","doi":"10.1016/j.compag.2025.110442","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning (DL) models have shown exceptional accuracy in plant disease identification, yet their practical utility for farmers remains limited due to a lack of professional and actionable guidance. To bridge this gap, we developed CDIP-ChatGLM3, an innovative framework that synergizes a state-of-the-art DL-based computer vision model with a fine-tuned large language model (LLM), designed specifically for Crop Disease Identification and Prescription (CDIP). EfficientNet-B2, evaluated among 10 DL models across 48 diseases and 13 crops, achieved top performance with 97.97 % ± 0.16 % accuracy at a 95 % confidence level. Building on this, we fine-tuned the widely used ChatGLM3-6B LLM using Low-Rank Adaptation (LoRA) and Freeze-tuning, optimizing its ability to deliver precise disease management prescriptions. We compared two training strategies—multi-task learning (MTL) and Dual-stage Mixed Fine-Tuning (DMT)—using a different combination of domain-specific and general datasets. Freeze-tuning with DMT led to substantial performance gains, achieving a 33.16 % improvement in BLEU-4 and a 27.04 % increase in the Average ROUGE F-score, surpassing the original model and state-of-the-art competitors such as Qwen-max, Llama-3.1-405B-Instruct, and GPT-4o. The dual-model architecture of CDIP-ChatGLM3 leverages the complementary strengths of computer vision for image-based disease detection and LLMs for contextualized, domain-specific text generation, offering unmatched specialization, interpretability, and scalability. Unlike resource-intensive multimodal models that blend modalities, our dual-model approach maintains efficiency while achieving superior performance in both disease identification and actionable prescription generation.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"236 ","pages":"Article 110442"},"PeriodicalIF":7.7000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CDIP-ChatGLM3: A dual-model approach integrating computer vision and language modeling for crop disease identification and prescription\",\"authors\":\"Changqing Yan , Zeyun Liang , Han Cheng , Shuyang Li , Guangpeng Yang , Zhiwei Li , Ling Yin , Junjie Qu , Jing Wang , Genghong Wu , Qi Tian , Qiang Yu , Gang Zhao\",\"doi\":\"10.1016/j.compag.2025.110442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep learning (DL) models have shown exceptional accuracy in plant disease identification, yet their practical utility for farmers remains limited due to a lack of professional and actionable guidance. To bridge this gap, we developed CDIP-ChatGLM3, an innovative framework that synergizes a state-of-the-art DL-based computer vision model with a fine-tuned large language model (LLM), designed specifically for Crop Disease Identification and Prescription (CDIP). EfficientNet-B2, evaluated among 10 DL models across 48 diseases and 13 crops, achieved top performance with 97.97 % ± 0.16 % accuracy at a 95 % confidence level. Building on this, we fine-tuned the widely used ChatGLM3-6B LLM using Low-Rank Adaptation (LoRA) and Freeze-tuning, optimizing its ability to deliver precise disease management prescriptions. We compared two training strategies—multi-task learning (MTL) and Dual-stage Mixed Fine-Tuning (DMT)—using a different combination of domain-specific and general datasets. Freeze-tuning with DMT led to substantial performance gains, achieving a 33.16 % improvement in BLEU-4 and a 27.04 % increase in the Average ROUGE F-score, surpassing the original model and state-of-the-art competitors such as Qwen-max, Llama-3.1-405B-Instruct, and GPT-4o. The dual-model architecture of CDIP-ChatGLM3 leverages the complementary strengths of computer vision for image-based disease detection and LLMs for contextualized, domain-specific text generation, offering unmatched specialization, interpretability, and scalability. Unlike resource-intensive multimodal models that blend modalities, our dual-model approach maintains efficiency while achieving superior performance in both disease identification and actionable prescription generation.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"236 \",\"pages\":\"Article 110442\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925005484\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925005484","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
CDIP-ChatGLM3: A dual-model approach integrating computer vision and language modeling for crop disease identification and prescription
Deep learning (DL) models have shown exceptional accuracy in plant disease identification, yet their practical utility for farmers remains limited due to a lack of professional and actionable guidance. To bridge this gap, we developed CDIP-ChatGLM3, an innovative framework that synergizes a state-of-the-art DL-based computer vision model with a fine-tuned large language model (LLM), designed specifically for Crop Disease Identification and Prescription (CDIP). EfficientNet-B2, evaluated among 10 DL models across 48 diseases and 13 crops, achieved top performance with 97.97 % ± 0.16 % accuracy at a 95 % confidence level. Building on this, we fine-tuned the widely used ChatGLM3-6B LLM using Low-Rank Adaptation (LoRA) and Freeze-tuning, optimizing its ability to deliver precise disease management prescriptions. We compared two training strategies—multi-task learning (MTL) and Dual-stage Mixed Fine-Tuning (DMT)—using a different combination of domain-specific and general datasets. Freeze-tuning with DMT led to substantial performance gains, achieving a 33.16 % improvement in BLEU-4 and a 27.04 % increase in the Average ROUGE F-score, surpassing the original model and state-of-the-art competitors such as Qwen-max, Llama-3.1-405B-Instruct, and GPT-4o. The dual-model architecture of CDIP-ChatGLM3 leverages the complementary strengths of computer vision for image-based disease detection and LLMs for contextualized, domain-specific text generation, offering unmatched specialization, interpretability, and scalability. Unlike resource-intensive multimodal models that blend modalities, our dual-model approach maintains efficiency while achieving superior performance in both disease identification and actionable prescription generation.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.