Automatic granary sweeping strategy using visual large language model

IF 2.7 2区 农林科学 Q1 ENTOMOLOGY
Boqiang Zhang , Jinhao Yan , Yuhe Gao , GenLiang Yang , Kunpeng Zhang , Junwu Li
{"title":"Automatic granary sweeping strategy using visual large language model","authors":"Boqiang Zhang ,&nbsp;Jinhao Yan ,&nbsp;Yuhe Gao ,&nbsp;GenLiang Yang ,&nbsp;Kunpeng Zhang ,&nbsp;Junwu Li","doi":"10.1016/j.jspr.2025.102619","DOIUrl":null,"url":null,"abstract":"<div><div>Food security is a fundamental element of human survival. Reducing grain losses and ensuring grain quality have extremely important practical implications. Enhancing the granary's intelligence is particularly important due to several issues affecting residue grain sweeping, including manual inefficiency, incomplete coverage, and expensive equipment. This work proposes a new method called the Residual Grain Sweeping Visual Large Mode (RGSVLM)<sup>1</sup> based on the Visual Large Language Model (VLLM). First, we constructed a semantic dataset containing images of various residual grain dispersal patterns captured in real granary environments. We also introduced an improved version of the Fast Segment Anything Model (FastSAM) algorithm to detect residual grains in the field images, extract visual features, and achieve accurate segmentation. In addition, we crafted prompt engineering that combines image data to produce corresponding textual datasets that effectively reflect the real-world situation. Next, we integrated this dataset with a chain of reasoning framework to fine-tune the visual large language model for specific tasks. This approach compensates for the original model's limitations in logical reasoning, enabling it to simulate human thought processes and generate clear and reasonable answers. In a granary environment, RGSVLM performs better than other models. This study's development and implementation of RGSVLM offers innovative concepts and techniques for building intelligent granaries.</div></div>","PeriodicalId":17019,"journal":{"name":"Journal of Stored Products Research","volume":"112 ","pages":"Article 102619"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Stored Products Research","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022474X25000785","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENTOMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Food security is a fundamental element of human survival. Reducing grain losses and ensuring grain quality have extremely important practical implications. Enhancing the granary's intelligence is particularly important due to several issues affecting residue grain sweeping, including manual inefficiency, incomplete coverage, and expensive equipment. This work proposes a new method called the Residual Grain Sweeping Visual Large Mode (RGSVLM)1 based on the Visual Large Language Model (VLLM). First, we constructed a semantic dataset containing images of various residual grain dispersal patterns captured in real granary environments. We also introduced an improved version of the Fast Segment Anything Model (FastSAM) algorithm to detect residual grains in the field images, extract visual features, and achieve accurate segmentation. In addition, we crafted prompt engineering that combines image data to produce corresponding textual datasets that effectively reflect the real-world situation. Next, we integrated this dataset with a chain of reasoning framework to fine-tune the visual large language model for specific tasks. This approach compensates for the original model's limitations in logical reasoning, enabling it to simulate human thought processes and generate clear and reasonable answers. In a granary environment, RGSVLM performs better than other models. This study's development and implementation of RGSVLM offers innovative concepts and techniques for building intelligent granaries.
基于视觉大语言模型的自动扫仓策略
粮食安全是人类生存的基本要素。减少粮食损失,保证粮食质量具有极其重要的现实意义。由于影响残粮清扫的几个问题,包括人工效率低下、覆盖不全和设备昂贵,提高粮仓的智能化尤为重要。本文提出了一种基于视觉大语言模型(VLLM)的残差颗粒扫描视觉大模式(rgsvm)1。首先,我们构建了一个包含在真实粮仓环境中捕获的各种剩余粮食分散模式图像的语义数据集。我们还引入了改进版的Fast Segment Anything Model (FastSAM)算法,用于检测现场图像中的残留颗粒,提取视觉特征,实现准确分割。此外,我们精心设计了提示工程,将图像数据结合起来,生成相应的文本数据集,有效地反映了现实世界的情况。接下来,我们将该数据集与推理框架链集成,以微调特定任务的视觉大型语言模型。这种方法弥补了原始模型在逻辑推理方面的局限性,使其能够模拟人类的思维过程,并产生清晰合理的答案。在粮仓环境中,rgsvm的性能优于其他模型。本研究的rgsvm的开发和实现为智能粮仓的建设提供了创新的概念和技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.70
自引率
18.50%
发文量
112
审稿时长
45 days
期刊介绍: The Journal of Stored Products Research provides an international medium for the publication of both reviews and original results from laboratory and field studies on the preservation and safety of stored products, notably food stocks, covering storage-related problems from the producer through the supply chain to the consumer. Stored products are characterised by having relatively low moisture content and include raw and semi-processed foods, animal feedstuffs, and a range of other durable items, including materials such as clothing or museum artefacts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信