Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy

IF 6.3 Q1 AGRICULTURAL ENGINEERING

Smart agricultural technology Pub Date : 2025-04-05 DOI:10.1016/j.atech.2025.100934

Hassan Afzaal , Derek Rude , Aitazaz A. Farooque , Gurjit S. Randhawa , Arnold W. Schumann , Nicholas Krouglicof

{"title":"Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy","authors":"Hassan Afzaal , Derek Rude , Aitazaz A. Farooque , Gurjit S. Randhawa , Arnold W. Schumann , Nicholas Krouglicof","doi":"10.1016/j.atech.2025.100934","DOIUrl":null,"url":null,"abstract":"<div><div>Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.</div><div>An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"11 ","pages":"Article 100934"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525001674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.

An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.

查看原文本刊更多论文

利用基于注意力的视觉变换和卷积神经网络结合深度建模改进作物行检测，提高空间精度

精准农业已经成为一项革命性的技术，通过优化作物产量和资源管理来解决全球粮食安全问题。将人工智能（AI）纳入农业实践，通过促进复杂的数据分析、预测建模和自动化，从根本上改变了这一学科。本研究提出了一种集成深度学习、精准农业和深度建模的新框架，以准确检测作物行和空间信息。所提出的框架采用最新的关注和基于卷积的编码器，如ConvFormer， CAFormer， Swin Transformer和ConvNextV2，在各种具有挑战性的农业环境中精确识别作物行。二元分割模型使用高分辨率大豆作物数据集（733幅图像）进行训练，该数据集由来自加拿大15个不同地点的数据组成，收集于不同的生长阶段。LabelMe和albument工具用于生成分割数据集，然后使用数据增强技术来增强数据的泛化和鲁棒性。通过训练（约70%，513张图像）、验证（约15%，109张图像）和测试（约15%，111张图像）分割，模型学会了从背景噪声中区分作物行，在多个指标上取得了显著的准确性，包括Precision、Recall、F1 Score和Dice Score。该管道的一个基本要素是结合Depth Pro模型，通过估计图像的绝对高度和深度图来精确计算地面采样距离（GSD）。对深度图进行分析，以检查15个现场图像簇的GSD变异性，揭示了大多数簇的GSD值范围为0.5至2.0 mm/像素。该模型在作物行分割任务中表现出优异的性能，在验证集上F1得分为0.8012，Precision为0.8512，Recall为0.7584，Accuracy为0.8477。在与最先进（SOTA）模型的比较分析中，ConvFormer在多个指标上优于ConvNextv2、CAFormer和Swin S3等替代方案。值得注意的是，与ResNet模型相比，ConvFormer模型在精度和召回率方面取得了更高的平衡，而ResNet模型的指标较低（例如，F1得分为0.7307，召回率为0.6551），强调了其在复杂农业场景中的有效性。此外，还测试了经典的机器视觉方法，用于从二进制分割掩模中提取线信息，这对于工厂分析、自动驾驶和其他各种应用都很有用。所提出的工作流程为自动化田间作业、优化资源效率和提高作物生产力提供了一个强大的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Smart agricultural technology

CiteScore

4.20

自引率

0.00%

发文量