Hao Wan , Xilei Zeng, Zeming Fan, Qinhu Chen, Ke Zhang, Han Zhang
{"title":"ET-PatchNet: A low-memory, efficient model for Multi-view Stereo with a case study on the 3D reconstruction of fruit tree branches","authors":"Hao Wan , Xilei Zeng, Zeming Fan, Qinhu Chen, Ke Zhang, Han Zhang","doi":"10.1016/j.compag.2025.110459","DOIUrl":null,"url":null,"abstract":"<div><div>The achievement of robotic fruit harvesting in intelligent farming depends heavily on the precise reconstruction of tree branch structures that direct harvesting arm movements. However, existing research in this field often grapples with computational inefficiencies and high costs. In response, we designed <strong>ET-PatchNet</strong>, a low-memory neural network for generating depth maps within the Multi-view Stereo (MVS) process, enabling efficient 3D reconstruction of branches. This highly <strong>Efficient network</strong> is based on <strong>Transformer</strong> and <strong>Patchmatchnet</strong>. ET-PatchNet incorporates an efficient backbone that includes self-attention and cross-attention mechanisms based on Transformer, which enriches global and 3D consistency information and enhances depth prediction accuracy and generalization performance. Furthermore, an adaptive depth resampling method has been developed, which is embedded in a iterative, coarse-to-fine, depth regression architecture based on learnable patches to minimize memory usage. To further amplify the representation capacity of depth characteristics, an auxiliary task has been integrated. Experimental results show that ET-PatchNet outperforms its competitors in completeness, computational efficiency, and low memory usage in evaluating the DTU and Tanks&Temples datasets. When predicting a single depth map at a resolution of 1152 × 864 pixels, it only took 0.13 <span><math><mi>s</mi></math></span> to inference, with a memory usage of just <span><math><mrow><mn>2824</mn><mspace></mspace><mi>MB</mi></mrow></math></span>. Moreover, the 3D structure of observable branches on apple trees has been effectively reconstructed by fine-tuning our model on the BlendedMVS dataset. The mean and variance of distances between our reconstructed branch points and reference points are only <span><math><mrow><mn>0</mn><mo>.</mo><mn>0292</mn><mspace></mspace><mi>mm</mi></mrow></math></span> and <span><math><mrow><mn>0</mn><mo>.</mo><mn>0187</mn><mspace></mspace><msup><mrow><mi>mm</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>. In conclusion, ET-PatchNet is ideal for integration into mobile embedded fruit harvesting equipment and exhibits significant potential for a wide range of applications.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110459"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925005654","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The achievement of robotic fruit harvesting in intelligent farming depends heavily on the precise reconstruction of tree branch structures that direct harvesting arm movements. However, existing research in this field often grapples with computational inefficiencies and high costs. In response, we designed ET-PatchNet, a low-memory neural network for generating depth maps within the Multi-view Stereo (MVS) process, enabling efficient 3D reconstruction of branches. This highly Efficient network is based on Transformer and Patchmatchnet. ET-PatchNet incorporates an efficient backbone that includes self-attention and cross-attention mechanisms based on Transformer, which enriches global and 3D consistency information and enhances depth prediction accuracy and generalization performance. Furthermore, an adaptive depth resampling method has been developed, which is embedded in a iterative, coarse-to-fine, depth regression architecture based on learnable patches to minimize memory usage. To further amplify the representation capacity of depth characteristics, an auxiliary task has been integrated. Experimental results show that ET-PatchNet outperforms its competitors in completeness, computational efficiency, and low memory usage in evaluating the DTU and Tanks&Temples datasets. When predicting a single depth map at a resolution of 1152 × 864 pixels, it only took 0.13 to inference, with a memory usage of just . Moreover, the 3D structure of observable branches on apple trees has been effectively reconstructed by fine-tuning our model on the BlendedMVS dataset. The mean and variance of distances between our reconstructed branch points and reference points are only and . In conclusion, ET-PatchNet is ideal for integration into mobile embedded fruit harvesting equipment and exhibits significant potential for a wide range of applications.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.