Optimizing dataset diversity for a robust deep-learning model in rice blast disease identification to enhance crop health assessment across diverse conditions

IF 6.3 Q1 AGRICULTURAL ENGINEERING
Reuben Alfred , Judith Leo , Shubi Felix Kaijage
{"title":"Optimizing dataset diversity for a robust deep-learning model in rice blast disease identification to enhance crop health assessment across diverse conditions","authors":"Reuben Alfred ,&nbsp;Judith Leo ,&nbsp;Shubi Felix Kaijage","doi":"10.1016/j.atech.2024.100726","DOIUrl":null,"url":null,"abstract":"<div><div><em>Magnaporthe oryzae,</em> the pathogen that causes rice blast disease, poses a significant global threat to rice production. This disease may lead to yield losses exceeding 30 % in susceptible rice varieties. There is an urgent need for more effective detection solutions, as traditional methods—primarily based on visual inspection—are time-consuming and prone to errors. Deep-learning models presented effective solutions for disease identification due to their ability to analyze large datasets. However, the diversity of the training dataset is significant for optimal performance and generalizability of the model. This study evaluated the impact of dataset diversity on model performance and generalizability by developing two models, referred to in this study as the <em>High-Diverse Model</em> and the <em>Low-Diverse Model</em>. The <em>High-Diverse Model</em> was trained on a diverse dataset comprising images from different geographical regions, rice species, environmental conditions, plant growth stages, and disease severity levels. In contrast, the Low-Diverse Model was trained on a less diverse dataset with significantly limited variability. The results showed that the High-Diverse Model significantly outperformed the Low-Diverse Model, achieving a training accuracy of 95.26 % and a validation accuracy of 94.43 %, indicating effective generalization. The Low-Diverse Model achieved an accuracy of 98.37 % on the training data but only 35.38 % on the validation data, indicating a severe overfitting issue associated with limited dataset diversity<em>.</em> This highlights the importance of dataset diversity in developing effective and scalable deep-learning models for crop health assessment.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"10 ","pages":"Article 100726"},"PeriodicalIF":6.3000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375524003307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Magnaporthe oryzae, the pathogen that causes rice blast disease, poses a significant global threat to rice production. This disease may lead to yield losses exceeding 30 % in susceptible rice varieties. There is an urgent need for more effective detection solutions, as traditional methods—primarily based on visual inspection—are time-consuming and prone to errors. Deep-learning models presented effective solutions for disease identification due to their ability to analyze large datasets. However, the diversity of the training dataset is significant for optimal performance and generalizability of the model. This study evaluated the impact of dataset diversity on model performance and generalizability by developing two models, referred to in this study as the High-Diverse Model and the Low-Diverse Model. The High-Diverse Model was trained on a diverse dataset comprising images from different geographical regions, rice species, environmental conditions, plant growth stages, and disease severity levels. In contrast, the Low-Diverse Model was trained on a less diverse dataset with significantly limited variability. The results showed that the High-Diverse Model significantly outperformed the Low-Diverse Model, achieving a training accuracy of 95.26 % and a validation accuracy of 94.43 %, indicating effective generalization. The Low-Diverse Model achieved an accuracy of 98.37 % on the training data but only 35.38 % on the validation data, indicating a severe overfitting issue associated with limited dataset diversity. This highlights the importance of dataset diversity in developing effective and scalable deep-learning models for crop health assessment.
引起稻瘟病的病原体 Magnaporthe oryzae 对全球水稻生产构成重大威胁。这种病害可能导致易感水稻品种的产量损失超过 30%。由于传统方法(主要基于目测)耗时且容易出错,因此迫切需要更有效的检测解决方案。深度学习模型因其分析大型数据集的能力,为病害识别提供了有效的解决方案。然而,训练数据集的多样性对模型的最佳性能和通用性至关重要。本研究通过开发两个模型,评估了数据集多样性对模型性能和普适性的影响,这两个模型在本研究中被称为高多样性模型和低多样性模型。高多样性模型是在由不同地理区域、水稻品种、环境条件、植物生长阶段和病害严重程度的图像组成的多样性数据集上进行训练的。相比之下,低多样性模型是在多样性较低的数据集上进行训练的,其可变性非常有限。结果表明,高分辨率模型的表现明显优于低分辨率模型,训练准确率达到 95.26%,验证准确率达到 94.43%,这表明高分辨率模型具有有效的泛化能力。低多样性模型在训练数据上的准确率达到了 98.37%,但在验证数据上的准确率仅为 35.38%,这表明数据集多样性有限导致了严重的过拟合问题。这凸显了数据集多样性在开发有效、可扩展的作物健康评估深度学习模型中的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信