ET-PatchNet: A low-memory, efficient model for Multi-view Stereo with a case study on the 3D reconstruction of fruit tree branches

IF 7.7 1区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY

Computers and Electronics in Agriculture Pub Date : 2025-05-17 DOI:10.1016/j.compag.2025.110459

Hao Wan , Xilei Zeng, Zeming Fan, Qinhu Chen, Ke Zhang, Han Zhang

{"title":"ET-PatchNet: A low-memory, efficient model for Multi-view Stereo with a case study on the 3D reconstruction of fruit tree branches","authors":"Hao Wan , Xilei Zeng, Zeming Fan, Qinhu Chen, Ke Zhang, Han Zhang","doi":"10.1016/j.compag.2025.110459","DOIUrl":null,"url":null,"abstract":"<div><div>The achievement of robotic fruit harvesting in intelligent farming depends heavily on the precise reconstruction of tree branch structures that direct harvesting arm movements. However, existing research in this field often grapples with computational inefficiencies and high costs. In response, we designed <strong>ET-PatchNet</strong>, a low-memory neural network for generating depth maps within the Multi-view Stereo (MVS) process, enabling efficient 3D reconstruction of branches. This highly <strong>Efficient network</strong> is based on <strong>Transformer</strong> and <strong>Patchmatchnet</strong>. ET-PatchNet incorporates an efficient backbone that includes self-attention and cross-attention mechanisms based on Transformer, which enriches global and 3D consistency information and enhances depth prediction accuracy and generalization performance. Furthermore, an adaptive depth resampling method has been developed, which is embedded in a iterative, coarse-to-fine, depth regression architecture based on learnable patches to minimize memory usage. To further amplify the representation capacity of depth characteristics, an auxiliary task has been integrated. Experimental results show that ET-PatchNet outperforms its competitors in completeness, computational efficiency, and low memory usage in evaluating the DTU and Tanks&Temples datasets. When predicting a single depth map at a resolution of 1152 × 864 pixels, it only took 0.13 <span><math><mi>s</mi></math></span> to inference, with a memory usage of just <span><math><mrow><mn>2824</mn><mspace></mspace><mi>MB</mi></mrow></math></span>. Moreover, the 3D structure of observable branches on apple trees has been effectively reconstructed by fine-tuning our model on the BlendedMVS dataset. The mean and variance of distances between our reconstructed branch points and reference points are only <span><math><mrow><mn>0</mn><mo>.</mo><mn>0292</mn><mspace></mspace><mi>mm</mi></mrow></math></span> and <span><math><mrow><mn>0</mn><mo>.</mo><mn>0187</mn><mspace></mspace><msup><mrow><mi>mm</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>. In conclusion, ET-PatchNet is ideal for integration into mobile embedded fruit harvesting equipment and exhibits significant potential for a wide range of applications.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110459"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925005654","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The achievement of robotic fruit harvesting in intelligent farming depends heavily on the precise reconstruction of tree branch structures that direct harvesting arm movements. However, existing research in this field often grapples with computational inefficiencies and high costs. In response, we designed ET-PatchNet, a low-memory neural network for generating depth maps within the Multi-view Stereo (MVS) process, enabling efficient 3D reconstruction of branches. This highly Efficient network is based on Transformer and Patchmatchnet. ET-PatchNet incorporates an efficient backbone that includes self-attention and cross-attention mechanisms based on Transformer, which enriches global and 3D consistency information and enhances depth prediction accuracy and generalization performance. Furthermore, an adaptive depth resampling method has been developed, which is embedded in a iterative, coarse-to-fine, depth regression architecture based on learnable patches to minimize memory usage. To further amplify the representation capacity of depth characteristics, an auxiliary task has been integrated. Experimental results show that ET-PatchNet outperforms its competitors in completeness, computational efficiency, and low memory usage in evaluating the DTU and Tanks&Temples datasets. When predicting a single depth map at a resolution of 1152 × 864 pixels, it only took 0.13

s

to inference, with a memory usage of just

2824 MB

. Moreover, the 3D structure of observable branches on apple trees has been effectively reconstructed by fine-tuning our model on the BlendedMVS dataset. The mean and variance of distances between our reconstructed branch points and reference points are only

0.0292 mm

and

0.0187 {mm}^{2}

. In conclusion, ET-PatchNet is ideal for integration into mobile embedded fruit harvesting equipment and exhibits significant potential for a wide range of applications.

查看原文本刊更多论文

ET-PatchNet：一种低内存、高效的多视图立体模型——以果树树枝三维重建为例

在智能农业中，机器人水果收获的实现在很大程度上依赖于树枝结构的精确重建，这些树枝结构指导收获臂的运动。然而，该领域的现有研究经常面临计算效率低下和成本高的问题。为此，我们设计了ET-PatchNet，这是一个低内存神经网络，用于在多视图立体（MVS）过程中生成深度图，从而实现分支的高效3D重建。这个高效的网络是基于Transformer和Patchmatchnet。ET-PatchNet融合了基于Transformer的自注意和交叉注意机制，丰富了全局和三维一致性信息，提高了深度预测精度和泛化性能。此外，开发了一种自适应深度重采样方法，该方法嵌入到基于可学习补丁的迭代、粗到精深度回归体系结构中，以最大限度地减少内存使用。为了进一步增强深度特征的表示能力，还集成了一个辅助任务。实验结果表明，在评估DTU和Tanks&；Temples数据集时，ET-PatchNet在完整性、计算效率和低内存占用方面优于竞争对手。当以1152 × 864像素的分辨率预测单个深度图时，只需要0.13 s的推断时间，内存使用仅为2824MB。此外，通过在BlendedMVS数据集上对模型进行微调，有效地重建了苹果树上可观测树枝的三维结构。重建支点与参考点距离的均值和方差仅为0.0292mm和0.0187mm2。总之，ET-PatchNet是集成到移动嵌入式水果收获设备的理想选择，具有广泛应用的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers and Electronics in Agriculture 工程技术-计算机：跨学科应用

CiteScore

15.30

自引率

14.50%

发文量

800

审稿时长

62 days

期刊介绍： Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.