Automatic granary sweeping strategy using visual large language model

IF 2.7 2区农林科学 Q1 ENTOMOLOGY

Journal of Stored Products Research Pub Date : 2025-03-10 DOI:10.1016/j.jspr.2025.102619

Boqiang Zhang , Jinhao Yan , Yuhe Gao , GenLiang Yang , Kunpeng Zhang , Junwu Li

{"title":"Automatic granary sweeping strategy using visual large language model","authors":"Boqiang Zhang , Jinhao Yan , Yuhe Gao , GenLiang Yang , Kunpeng Zhang , Junwu Li","doi":"10.1016/j.jspr.2025.102619","DOIUrl":null,"url":null,"abstract":"<div><div>Food security is a fundamental element of human survival. Reducing grain losses and ensuring grain quality have extremely important practical implications. Enhancing the granary's intelligence is particularly important due to several issues affecting residue grain sweeping, including manual inefficiency, incomplete coverage, and expensive equipment. This work proposes a new method called the Residual Grain Sweeping Visual Large Mode (RGSVLM)<sup>1</sup> based on the Visual Large Language Model (VLLM). First, we constructed a semantic dataset containing images of various residual grain dispersal patterns captured in real granary environments. We also introduced an improved version of the Fast Segment Anything Model (FastSAM) algorithm to detect residual grains in the field images, extract visual features, and achieve accurate segmentation. In addition, we crafted prompt engineering that combines image data to produce corresponding textual datasets that effectively reflect the real-world situation. Next, we integrated this dataset with a chain of reasoning framework to fine-tune the visual large language model for specific tasks. This approach compensates for the original model's limitations in logical reasoning, enabling it to simulate human thought processes and generate clear and reasonable answers. In a granary environment, RGSVLM performs better than other models. This study's development and implementation of RGSVLM offers innovative concepts and techniques for building intelligent granaries.</div></div>","PeriodicalId":17019,"journal":{"name":"Journal of Stored Products Research","volume":"112 ","pages":"Article 102619"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Stored Products Research","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022474X25000785","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENTOMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Food security is a fundamental element of human survival. Reducing grain losses and ensuring grain quality have extremely important practical implications. Enhancing the granary's intelligence is particularly important due to several issues affecting residue grain sweeping, including manual inefficiency, incomplete coverage, and expensive equipment. This work proposes a new method called the Residual Grain Sweeping Visual Large Mode (RGSVLM)¹ based on the Visual Large Language Model (VLLM). First, we constructed a semantic dataset containing images of various residual grain dispersal patterns captured in real granary environments. We also introduced an improved version of the Fast Segment Anything Model (FastSAM) algorithm to detect residual grains in the field images, extract visual features, and achieve accurate segmentation. In addition, we crafted prompt engineering that combines image data to produce corresponding textual datasets that effectively reflect the real-world situation. Next, we integrated this dataset with a chain of reasoning framework to fine-tune the visual large language model for specific tasks. This approach compensates for the original model's limitations in logical reasoning, enabling it to simulate human thought processes and generate clear and reasonable answers. In a granary environment, RGSVLM performs better than other models. This study's development and implementation of RGSVLM offers innovative concepts and techniques for building intelligent granaries.

查看原文本刊更多论文

基于视觉大语言模型的自动扫仓策略

粮食安全是人类生存的基本要素。减少粮食损失，保证粮食质量具有极其重要的现实意义。由于影响残粮清扫的几个问题，包括人工效率低下、覆盖不全和设备昂贵，提高粮仓的智能化尤为重要。本文提出了一种基于视觉大语言模型（VLLM）的残差颗粒扫描视觉大模式（rgsvm）1。首先，我们构建了一个包含在真实粮仓环境中捕获的各种剩余粮食分散模式图像的语义数据集。我们还引入了改进版的Fast Segment Anything Model （FastSAM）算法，用于检测现场图像中的残留颗粒，提取视觉特征，实现准确分割。此外，我们精心设计了提示工程，将图像数据结合起来，生成相应的文本数据集，有效地反映了现实世界的情况。接下来，我们将该数据集与推理框架链集成，以微调特定任务的视觉大型语言模型。这种方法弥补了原始模型在逻辑推理方面的局限性，使其能够模拟人类的思维过程，并产生清晰合理的答案。在粮仓环境中，rgsvm的性能优于其他模型。本研究的rgsvm的开发和实现为智能粮仓的建设提供了创新的概念和技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Stored Products Research 生物-昆虫学

CiteScore

5.70

自引率

18.50%

发文量

112

审稿时长

45 days

期刊介绍： The Journal of Stored Products Research provides an international medium for the publication of both reviews and original results from laboratory and field studies on the preservation and safety of stored products, notably food stocks, covering storage-related problems from the producer through the supply chain to the consumer. Stored products are characterised by having relatively low moisture content and include raw and semi-processed foods, animal feedstuffs, and a range of other durable items, including materials such as clothing or museum artefacts.