CDIP-ChatGLM3：一种集成计算机视觉和语言建模的作物病害识别和处方双模型方法

IF 7.7 1区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY

Computers and Electronics in Agriculture Pub Date : 2025-04-25 DOI:10.1016/j.compag.2025.110442

Changqing Yan , Zeyun Liang , Han Cheng , Shuyang Li , Guangpeng Yang , Zhiwei Li , Ling Yin , Junjie Qu , Jing Wang , Genghong Wu , Qi Tian , Qiang Yu , Gang Zhao

{"title":"CDIP-ChatGLM3：一种集成计算机视觉和语言建模的作物病害识别和处方双模型方法","authors":"Changqing Yan , Zeyun Liang , Han Cheng , Shuyang Li , Guangpeng Yang , Zhiwei Li , Ling Yin , Junjie Qu , Jing Wang , Genghong Wu , Qi Tian , Qiang Yu , Gang Zhao","doi":"10.1016/j.compag.2025.110442","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning (DL) models have shown exceptional accuracy in plant disease identification, yet their practical utility for farmers remains limited due to a lack of professional and actionable guidance. To bridge this gap, we developed CDIP-ChatGLM3, an innovative framework that synergizes a state-of-the-art DL-based computer vision model with a fine-tuned large language model (LLM), designed specifically for Crop Disease Identification and Prescription (CDIP). EfficientNet-B2, evaluated among 10 DL models across 48 diseases and 13 crops, achieved top performance with 97.97 % ± 0.16 % accuracy at a 95 % confidence level. Building on this, we fine-tuned the widely used ChatGLM3-6B LLM using Low-Rank Adaptation (LoRA) and Freeze-tuning, optimizing its ability to deliver precise disease management prescriptions. We compared two training strategies—multi-task learning (MTL) and Dual-stage Mixed Fine-Tuning (DMT)—using a different combination of domain-specific and general datasets. Freeze-tuning with DMT led to substantial performance gains, achieving a 33.16 % improvement in BLEU-4 and a 27.04 % increase in the Average ROUGE F-score, surpassing the original model and state-of-the-art competitors such as Qwen-max, Llama-3.1-405B-Instruct, and GPT-4o. The dual-model architecture of CDIP-ChatGLM3 leverages the complementary strengths of computer vision for image-based disease detection and LLMs for contextualized, domain-specific text generation, offering unmatched specialization, interpretability, and scalability. Unlike resource-intensive multimodal models that blend modalities, our dual-model approach maintains efficiency while achieving superior performance in both disease identification and actionable prescription generation.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"236 ","pages":"Article 110442"},"PeriodicalIF":7.7000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CDIP-ChatGLM3: A dual-model approach integrating computer vision and language modeling for crop disease identification and prescription\",\"authors\":\"Changqing Yan , Zeyun Liang , Han Cheng , Shuyang Li , Guangpeng Yang , Zhiwei Li , Ling Yin , Junjie Qu , Jing Wang , Genghong Wu , Qi Tian , Qiang Yu , Gang Zhao\",\"doi\":\"10.1016/j.compag.2025.110442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep learning (DL) models have shown exceptional accuracy in plant disease identification, yet their practical utility for farmers remains limited due to a lack of professional and actionable guidance. To bridge this gap, we developed CDIP-ChatGLM3, an innovative framework that synergizes a state-of-the-art DL-based computer vision model with a fine-tuned large language model (LLM), designed specifically for Crop Disease Identification and Prescription (CDIP). EfficientNet-B2, evaluated among 10 DL models across 48 diseases and 13 crops, achieved top performance with 97.97 % ± 0.16 % accuracy at a 95 % confidence level. Building on this, we fine-tuned the widely used ChatGLM3-6B LLM using Low-Rank Adaptation (LoRA) and Freeze-tuning, optimizing its ability to deliver precise disease management prescriptions. We compared two training strategies—multi-task learning (MTL) and Dual-stage Mixed Fine-Tuning (DMT)—using a different combination of domain-specific and general datasets. Freeze-tuning with DMT led to substantial performance gains, achieving a 33.16 % improvement in BLEU-4 and a 27.04 % increase in the Average ROUGE F-score, surpassing the original model and state-of-the-art competitors such as Qwen-max, Llama-3.1-405B-Instruct, and GPT-4o. The dual-model architecture of CDIP-ChatGLM3 leverages the complementary strengths of computer vision for image-based disease detection and LLMs for contextualized, domain-specific text generation, offering unmatched specialization, interpretability, and scalability. Unlike resource-intensive multimodal models that blend modalities, our dual-model approach maintains efficiency while achieving superior performance in both disease identification and actionable prescription generation.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"236 \",\"pages\":\"Article 110442\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925005484\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925005484","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

深度学习（DL）模型在植物病害识别方面显示出卓越的准确性，但由于缺乏专业和可操作的指导，它们对农民的实际效用仍然有限。为了弥补这一差距，我们开发了CDIP- chatglm3，这是一个创新的框架，它将最先进的基于dl的计算机视觉模型与专门为作物疾病识别和处方（CDIP）设计的微调大语言模型（LLM）协同起来。在48种疾病和13种作物的10个DL模型中，对EfficientNet-B2进行了评估，在95%的置信度下，准确率为97.97%±0.16%。在此基础上，我们对广泛使用的ChatGLM3-6B LLM进行了低秩适应（Low-Rank Adaptation, LoRA）和冷冻调整（Freeze-tuning），优化了其提供精确疾病管理处方的能力。我们比较了两种训练策略-多任务学习（MTL）和双阶段混合微调(DMT) -使用特定领域和通用数据集的不同组合。使用DMT进行冻结调整导致了显著的性能提升，BLEU-4的性能提高了33.16%，ROUGE平均f分数提高了27.04%，超过了原始型号和最先进的竞争对手，如Qwen-max， Llama-3.1-405B-Instruct和gpt -40。CDIP-ChatGLM3的双模型架构利用了基于图像的疾病检测的计算机视觉和用于情境化、特定领域文本生成的llm的互补优势，提供无与伦比的专业化、可解释性和可扩展性。与混合模式的资源密集型多模式模型不同，我们的双模式方法在保持效率的同时，在疾病识别和可操作的处方生成方面都取得了卓越的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

CDIP-ChatGLM3: A dual-model approach integrating computer vision and language modeling for crop disease identification and prescription

查看原文本刊更多论文

CDIP-ChatGLM3: A dual-model approach integrating computer vision and language modeling for crop disease identification and prescription

Deep learning (DL) models have shown exceptional accuracy in plant disease identification, yet their practical utility for farmers remains limited due to a lack of professional and actionable guidance. To bridge this gap, we developed CDIP-ChatGLM3, an innovative framework that synergizes a state-of-the-art DL-based computer vision model with a fine-tuned large language model (LLM), designed specifically for Crop Disease Identification and Prescription (CDIP). EfficientNet-B2, evaluated among 10 DL models across 48 diseases and 13 crops, achieved top performance with 97.97 % ± 0.16 % accuracy at a 95 % confidence level. Building on this, we fine-tuned the widely used ChatGLM3-6B LLM using Low-Rank Adaptation (LoRA) and Freeze-tuning, optimizing its ability to deliver precise disease management prescriptions. We compared two training strategies—multi-task learning (MTL) and Dual-stage Mixed Fine-Tuning (DMT)—using a different combination of domain-specific and general datasets. Freeze-tuning with DMT led to substantial performance gains, achieving a 33.16 % improvement in BLEU-4 and a 27.04 % increase in the Average ROUGE F-score, surpassing the original model and state-of-the-art competitors such as Qwen-max, Llama-3.1-405B-Instruct, and GPT-4o. The dual-model architecture of CDIP-ChatGLM3 leverages the complementary strengths of computer vision for image-based disease detection and LLMs for contextualized, domain-specific text generation, offering unmatched specialization, interpretability, and scalability. Unlike resource-intensive multimodal models that blend modalities, our dual-model approach maintains efficiency while achieving superior performance in both disease identification and actionable prescription generation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers and Electronics in Agriculture 工程技术-计算机：跨学科应用

CiteScore

15.30

自引率

14.50%

发文量

800

审稿时长

62 days

期刊介绍： Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.