Multi-Modal Vision Transformer for high-resolution soil texture prediction of German agricultural soils using remote sensing imagery

IF 11.4 1区 地球科学 Q1 ENVIRONMENTAL SCIENCES
Lucas Wittstruck, Björn Waske, Thomas Jarmer
{"title":"Multi-Modal Vision Transformer for high-resolution soil texture prediction of German agricultural soils using remote sensing imagery","authors":"Lucas Wittstruck,&nbsp;Björn Waske,&nbsp;Thomas Jarmer","doi":"10.1016/j.rse.2025.114985","DOIUrl":null,"url":null,"abstract":"<div><div>The quantification and mapping of important soil properties, such as soil texture, are vital for effective crop management and the assessment of overall soil health in agricultural systems. In this study, we propose a multi-modal Visual Transformer (MMVT) architecture to predict and map the soil particle size distribution of agricultural topsoils in Germany at a high spatial resolution of 10 meters. Our modeling utilized multi-source bare soil satellite image composites with terrain and soil-related covariates. To optimize the model’s ability to capture spatial soil context, various image sizes were evaluated. The study findings highlighted the effectiveness of our MMVT model, demonstrating improved estimation accuracies compared to a two-dimensional Convolutional Neural Network (2D CNN) and a Random Forest (RF) model. Specifically, the proposed transformer network achieved the highest averaged validated accuracy in predicting the soil texture when incorporating a contextual image surrounding of 320 × 320 m around the soil sampling positions (Sand: <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.74, RMSE = 14.78%, and RPIQ = 3.52, Silt: <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.73, RMSE = 12.36%, and RPIQ = 3.50, Clay: <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.52, RMSE = 6.30%, and RPIQ = 1.95). This integrated approach underscores the potential of advanced deep learning techniques and multi-modal learning in providing comprehensive insights into soil characteristics with high resolution and at a large scale.</div></div>","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"331 ","pages":"Article 114985"},"PeriodicalIF":11.4000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003442572500389X","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The quantification and mapping of important soil properties, such as soil texture, are vital for effective crop management and the assessment of overall soil health in agricultural systems. In this study, we propose a multi-modal Visual Transformer (MMVT) architecture to predict and map the soil particle size distribution of agricultural topsoils in Germany at a high spatial resolution of 10 meters. Our modeling utilized multi-source bare soil satellite image composites with terrain and soil-related covariates. To optimize the model’s ability to capture spatial soil context, various image sizes were evaluated. The study findings highlighted the effectiveness of our MMVT model, demonstrating improved estimation accuracies compared to a two-dimensional Convolutional Neural Network (2D CNN) and a Random Forest (RF) model. Specifically, the proposed transformer network achieved the highest averaged validated accuracy in predicting the soil texture when incorporating a contextual image surrounding of 320 × 320 m around the soil sampling positions (Sand: R2 = 0.74, RMSE = 14.78%, and RPIQ = 3.52, Silt: R2 = 0.73, RMSE = 12.36%, and RPIQ = 3.50, Clay: R2 = 0.52, RMSE = 6.30%, and RPIQ = 1.95). This integrated approach underscores the potential of advanced deep learning techniques and multi-modal learning in providing comprehensive insights into soil characteristics with high resolution and at a large scale.
基于遥感影像的德国农业土壤高分辨率土壤质地预测的多模态视觉变压器
土壤质地等重要土壤特性的量化和制图对于有效的作物管理和农业系统整体土壤健康评估至关重要。在这项研究中,我们提出了一个多模态可视化变压器(MMVT)架构,以10米的高空间分辨率预测和绘制德国农业表土的土壤粒度分布。我们的建模使用了多源裸地卫星图像,其中包含地形和土壤相关协变量。为了优化模型捕捉空间土壤环境的能力,对不同的图像尺寸进行了评估。研究结果强调了MMVT模型的有效性,与二维卷积神经网络(2D CNN)和随机森林(RF)模型相比,MMVT模型的估计精度得到了提高。具体而言,当结合土壤采样位置周围320 × 320 m的背景图像时,所提出的变压器网络在预测土壤质地方面取得了最高的平均验证精度(沙子:R2R2 = 0.74, RMSE = 14.78%, RPIQ = 3.52,淤泥:R2R2 = 0.73, RMSE = 12.36%, RPIQ = 3.50,粘土:R2R2 = 0.52, RMSE = 6.30%, RPIQ = 1.95)。这种综合方法强调了先进的深度学习技术和多模式学习在提供高分辨率和大规模的土壤特征综合见解方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Remote Sensing of Environment
Remote Sensing of Environment 环境科学-成像科学与照相技术
CiteScore
25.10
自引率
8.90%
发文量
455
审稿时长
53 days
期刊介绍: Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing. The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques. RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信