Hassan Afzaal , Derek Rude , Aitazaz A. Farooque , Gurjit S. Randhawa , Arnold W. Schumann , Nicholas Krouglicof
{"title":"Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy","authors":"Hassan Afzaal , Derek Rude , Aitazaz A. Farooque , Gurjit S. Randhawa , Arnold W. Schumann , Nicholas Krouglicof","doi":"10.1016/j.atech.2025.100934","DOIUrl":null,"url":null,"abstract":"<div><div>Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.</div><div>An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"11 ","pages":"Article 100934"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525001674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Precision agriculture has emerged as a revolutionary technology for tackling global food security issues by optimizing crop yield and resource management. Incorporating artificial intelligence (AI) within agricultural practices has fundamentally transformed the discipline by facilitating sophisticated data analysis, predictive modeling, and automation. This research presents a novel framework that integrates deep learning, precision agriculture, and depth modeling to detect crop rows and spatial information accurately. The proposed framework employs the latest attention and convolution-based encoders, such as ConvFormer, CAFormer, Swin Transformer, and ConvNextV2, in precisely identifying crop rows across varied and challenging agricultural environments. The binary segmentation models were trained using a high-resolution soybean crop dataset (733 images), which consisted of data from fifteen distinct locations in Canada, collected during different growth phases. LabelMe and albumentation tools were used to generate a segmentation dataset, followed by data augmentation techniques to enhance data generalization and robustness. With training (∼70 %, 513 images), validation (∼15 %, 109 images), and test (∼15 %, 111 images) splits, the models learned to differentiate crop rows from background noise, achieving notable accuracy across multiple metrics, including Precision, Recall, F1 Score, and Dice Score.
An essential element of this pipeline is incorporating the Depth Pro model for precise computation of Ground Sampling Distance (GSD) by estimating images' absolute height and depth maps. The depth maps were analyzed to examine GSD variability across fifteen clusters of field images, revealing a spectrum of GSD values ranging from 0.5 to 2.0 mm/pixel for most clusters. The proposed model demonstrates superior performance in crop row segmentation tasks, achieving an F1 Score of 0.8012, Precision of 0.8512, Recall of 0.7584, and Accuracy of 0.8477 on the validation set. In comparative analysis with state-of-the-art (SOTA) models, ConvFormer outperformed alternatives such as ConvNextv2, CAFormer, and Swin S3 across multiple metrics. Notably, ConvFormer achieves a higher balance of precision and recall than ResNet models, which exhibit lower metrics (e.g., F1 Score of 0.7307 and Recall of 0.6551), underscoring its effectiveness in complex agricultural scenarios. Furthermore, classic machine vision methods were tested for extracting line information from binary segmentation masks, which can be useful for plant analytics, autonomous driving, and other various applications. The proposed workflow offers a robust solution for automating field operations, optimizing resource efficiency, and improving crop productivity.