Yiqun Xie , Anh N. Nhu , Xiao-Peng Song , Xiaowei Jia , Sergii Skakun , Haijun Li , Zhihao Wang
{"title":"Accounting for spatial variability with geo-aware random forest: A case study for US major crop mapping","authors":"Yiqun Xie , Anh N. Nhu , Xiao-Peng Song , Xiaowei Jia , Sergii Skakun , Haijun Li , Zhihao Wang","doi":"10.1016/j.rse.2024.114585","DOIUrl":null,"url":null,"abstract":"<div><div>Spatial variability has been one of the major challenges for large-area crop monitoring and classification with remote sensing. Recent works on deep learning have introduced spatial transformation methods to automatically partition a heterogeneous region into multiple homogeneous sub-regions during the training process. However, the framework is only designed for deep learning and is not available for other models, e.g., decision tree and random forest, which are frequently the models of choice in many crop mapping products. This paper develops a geo-aware random forest (Geo-RF) model to enable new capabilities to automatically recognize spatial variability during training, partition the space, and learn local models. Specifically, Geo-RF can capture spatial partitions with flexible shapes via an efficient bi-partitioning optimization algorithm. Geo-RF also automatically determines the number of partitions needed in a hierarchical manner via statistical tests and builds local RF models along the partitioning process to explicitly address spatial variability and improve classification quality. We used both synthetic and real-world data to evaluate the effectiveness of Geo-RF. First, through the controlled synthetic experiment, Geo-RF demonstrated the ability to capture the artificially-inserted true partition where a different relationship between the inputs and outputs is used. Second, we showed the improvements from Geo-RF using crop classification for five major crops over the contiguous US. The results demonstrated that Geo-RF is able to significantly improve classification performance in sub-regions that are otherwise compromised in a single RF model. For example, the partition around downstream Mississippi for soybean classification led to major improvements for about 0.10-0.25 in F1 scores in the area, and the score increased from 0.57 to 0.82 at certain locations. Similarly, for rice classification, the partition in Arkansas led to F1 scores increasing from 0.59 to 0.88 in local areas. In addition, we evaluated the models under different parameter settings, and the results showed that Geo-RF led to improvements over RF in the vast majority of scenarios (e.g., varying model complexity and training sizes). Computationally, Geo-RF took about one to three times more training time while its execution time during testing was similar to that of RF. Overall, Geo-RF showed the ability to automatically address spatial variability via partitioning optimization, which is an important skill for improving crop classification over heterogeneous geographic areas at large scale. Future research can explore the use of Geo-RF for other geographic regions and applications, interpretable methods to understand the data-driven partitioning, and new designs to further enhance the computational efficiency.</div></div>","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"319 ","pages":"Article 114585"},"PeriodicalIF":11.1000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0034425724006114","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Spatial variability has been one of the major challenges for large-area crop monitoring and classification with remote sensing. Recent works on deep learning have introduced spatial transformation methods to automatically partition a heterogeneous region into multiple homogeneous sub-regions during the training process. However, the framework is only designed for deep learning and is not available for other models, e.g., decision tree and random forest, which are frequently the models of choice in many crop mapping products. This paper develops a geo-aware random forest (Geo-RF) model to enable new capabilities to automatically recognize spatial variability during training, partition the space, and learn local models. Specifically, Geo-RF can capture spatial partitions with flexible shapes via an efficient bi-partitioning optimization algorithm. Geo-RF also automatically determines the number of partitions needed in a hierarchical manner via statistical tests and builds local RF models along the partitioning process to explicitly address spatial variability and improve classification quality. We used both synthetic and real-world data to evaluate the effectiveness of Geo-RF. First, through the controlled synthetic experiment, Geo-RF demonstrated the ability to capture the artificially-inserted true partition where a different relationship between the inputs and outputs is used. Second, we showed the improvements from Geo-RF using crop classification for five major crops over the contiguous US. The results demonstrated that Geo-RF is able to significantly improve classification performance in sub-regions that are otherwise compromised in a single RF model. For example, the partition around downstream Mississippi for soybean classification led to major improvements for about 0.10-0.25 in F1 scores in the area, and the score increased from 0.57 to 0.82 at certain locations. Similarly, for rice classification, the partition in Arkansas led to F1 scores increasing from 0.59 to 0.88 in local areas. In addition, we evaluated the models under different parameter settings, and the results showed that Geo-RF led to improvements over RF in the vast majority of scenarios (e.g., varying model complexity and training sizes). Computationally, Geo-RF took about one to three times more training time while its execution time during testing was similar to that of RF. Overall, Geo-RF showed the ability to automatically address spatial variability via partitioning optimization, which is an important skill for improving crop classification over heterogeneous geographic areas at large scale. Future research can explore the use of Geo-RF for other geographic regions and applications, interpretable methods to understand the data-driven partitioning, and new designs to further enhance the computational efficiency.
期刊介绍:
Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing.
The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques.
RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.