Ziang Zhao , Yulia Hicks , Xianfang Sun , Chaoxi Luo
{"title":"FruitQuery:用于田间水果成熟度测定的基于查询的轻量级实例分割模型","authors":"Ziang Zhao , Yulia Hicks , Xianfang Sun , Chaoxi Luo","doi":"10.1016/j.atech.2025.101068","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate fruit instance segmentation at different ripeness stages is critical for developing autonomous harvesting robots, particularly given the unstructured in-field conditions. In this paper, we combine two in-field fruit datasets of peaches and strawberries for multiple ripeness stages determination, and propose a lightweight query-based instance segmentation model named FruitQuery.</div><div>The combined dataset contains 3 peach ripeness stages and 4 strawberry ripeness stages, covering various unstructured conditions of two popular fruits. The model FruitQuery consists of three parts: a backbone, a pixel decoder and Transformer decoders. Efficient multi-head self-attention modules are introduced to the backbone to reduce computational overhead, and a pyramid pooling module is added to the pixel decoder to enhance multi-scale feature fusion. Transformer decoders are then applied to learn a fixed number of queries from features and generate instance masks, avoiding postprocessing like non-maximum suppression. FruitQuery runs in an end-to-end way and incorporates the convolution and Transformer to capture fine-grained features related to different fruits at different ripeness stages.</div><div>Extensive experiments on the combined fruit dataset demonstrate that our FruitQuery achieves the highest average precision of 67.02 with only 14.08M parameters, outperforming 13 state-of-the-art models with 33 variants. It is noted that FruitQuery surpasses three series of YOLO (v8, v9 and v10) by a large margin. Ablation studies and visualizations also show its robust feature extraction with fewer parameter usage, indicating that the query-based design is effective in localizing fruit. These results highlight FruitQuery's compelling balance between segmentation performance and model size, offering the potential for in-field application.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"12 ","pages":"Article 101068"},"PeriodicalIF":5.7000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FruitQuery: A lightweight query-based instance segmentation model for in-field fruit ripeness determination\",\"authors\":\"Ziang Zhao , Yulia Hicks , Xianfang Sun , Chaoxi Luo\",\"doi\":\"10.1016/j.atech.2025.101068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate fruit instance segmentation at different ripeness stages is critical for developing autonomous harvesting robots, particularly given the unstructured in-field conditions. In this paper, we combine two in-field fruit datasets of peaches and strawberries for multiple ripeness stages determination, and propose a lightweight query-based instance segmentation model named FruitQuery.</div><div>The combined dataset contains 3 peach ripeness stages and 4 strawberry ripeness stages, covering various unstructured conditions of two popular fruits. The model FruitQuery consists of three parts: a backbone, a pixel decoder and Transformer decoders. Efficient multi-head self-attention modules are introduced to the backbone to reduce computational overhead, and a pyramid pooling module is added to the pixel decoder to enhance multi-scale feature fusion. Transformer decoders are then applied to learn a fixed number of queries from features and generate instance masks, avoiding postprocessing like non-maximum suppression. FruitQuery runs in an end-to-end way and incorporates the convolution and Transformer to capture fine-grained features related to different fruits at different ripeness stages.</div><div>Extensive experiments on the combined fruit dataset demonstrate that our FruitQuery achieves the highest average precision of 67.02 with only 14.08M parameters, outperforming 13 state-of-the-art models with 33 variants. It is noted that FruitQuery surpasses three series of YOLO (v8, v9 and v10) by a large margin. Ablation studies and visualizations also show its robust feature extraction with fewer parameter usage, indicating that the query-based design is effective in localizing fruit. These results highlight FruitQuery's compelling balance between segmentation performance and model size, offering the potential for in-field application.</div></div>\",\"PeriodicalId\":74813,\"journal\":{\"name\":\"Smart agricultural technology\",\"volume\":\"12 \",\"pages\":\"Article 101068\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Smart agricultural technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772375525003016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURAL ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525003016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
FruitQuery: A lightweight query-based instance segmentation model for in-field fruit ripeness determination
Accurate fruit instance segmentation at different ripeness stages is critical for developing autonomous harvesting robots, particularly given the unstructured in-field conditions. In this paper, we combine two in-field fruit datasets of peaches and strawberries for multiple ripeness stages determination, and propose a lightweight query-based instance segmentation model named FruitQuery.
The combined dataset contains 3 peach ripeness stages and 4 strawberry ripeness stages, covering various unstructured conditions of two popular fruits. The model FruitQuery consists of three parts: a backbone, a pixel decoder and Transformer decoders. Efficient multi-head self-attention modules are introduced to the backbone to reduce computational overhead, and a pyramid pooling module is added to the pixel decoder to enhance multi-scale feature fusion. Transformer decoders are then applied to learn a fixed number of queries from features and generate instance masks, avoiding postprocessing like non-maximum suppression. FruitQuery runs in an end-to-end way and incorporates the convolution and Transformer to capture fine-grained features related to different fruits at different ripeness stages.
Extensive experiments on the combined fruit dataset demonstrate that our FruitQuery achieves the highest average precision of 67.02 with only 14.08M parameters, outperforming 13 state-of-the-art models with 33 variants. It is noted that FruitQuery surpasses three series of YOLO (v8, v9 and v10) by a large margin. Ablation studies and visualizations also show its robust feature extraction with fewer parameter usage, indicating that the query-based design is effective in localizing fruit. These results highlight FruitQuery's compelling balance between segmentation performance and model size, offering the potential for in-field application.