Zhidan Wen , Qiang Wang , Yue Ma , Pierre Andre Jacinthe , Ge Liu , Sijia Li , Yingxin Shang , Hui Tao , Chong Fang , Lili Lyu , Baohua Zhang , Kaishan Song
{"title":"Remote estimates of suspended particulate matter in global lakes using machine learning models","authors":"Zhidan Wen , Qiang Wang , Yue Ma , Pierre Andre Jacinthe , Ge Liu , Sijia Li , Yingxin Shang , Hui Tao , Chong Fang , Lili Lyu , Baohua Zhang , Kaishan Song","doi":"10.1016/j.iswcr.2023.07.002","DOIUrl":null,"url":null,"abstract":"<div><p>Suspended particulate matter (SPM) in lakes exerts strong impact on light propagation, aquatic ecosystem productivity, which co-varies with nutrients, heavy metal and micro-pollutant in waters. In lakes, SPM exerts strong absorption and backscattering, ultimately affects water leaving signals that can be detected by satellite sensors. Simple regression models based on specific band or hand ratios have been widely used for SPM estimate in the past with moderate accuracy. There are still rooms for model accuracy improvements, and machine learning models may solve the non-linear relationships between spectral variable and SPM in waters. We assembled more than 16,400 <em>in situ</em> measured SPM in lakes from six continents (excluding the Antarctica continent), of which 9640 samples were matched with Landsat overpasses within ±7 days. Seven machine learning algorithms and two simple regression methods (linear and partial least squares models) were used to estimate SPM in lakes and the performance were compared. To overcome the problem of imbalance datasets in regression, a Synthetic Minority Over-Sampling technique for regression with Gaussian Noise (SMOGN) was adopted in this study. Through comparison, we found that gradient boosting decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) models demonstrated good spatiotemporal transferability with SMOGN processed dataset, and has potential to map SPM at different year with good quality of Landsat land surface reflectance images. In all the tested modeling approaches, the GBDT model has accurate calibration (n = 6428, R<sup>2</sup> = 0.95, MAPE = 29.8%) from SPM collected in 2235 lakes across the world, and the validation (n = 3214, R<sup>2</sup> = 0.84, MAPE = 38.8%) also exhibited stable performance. Further, the good performances were also exhibited by RF model with calibration (R<sup>2</sup> = 0.93) and validation (R<sup>2</sup> = 0.86, MAPE = 24.2%) datasets. We applied GBDT and RF models to map SPM of typical lakes, and satisfactory result was obtained. In addition, the GBDT model was evaluated by historical SPM measurements coincident with different Landsat sensors (L5-TM, L7-ETM+, and L8-OLI), thus the model has the potential to map SPM of lakes for monitoring temporal variations, and tracks lake water SPM dynamics in approximately the past four decades (1984–2021) since Landsat-5/TM was launched in 1984.</p></div>","PeriodicalId":48622,"journal":{"name":"International Soil and Water Conservation Research","volume":"12 1","pages":"Pages 200-216"},"PeriodicalIF":7.3000,"publicationDate":"2023-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2095633923000564/pdfft?md5=37872fb5d5982f62d67a65a3d27412a1&pid=1-s2.0-S2095633923000564-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Soil and Water Conservation Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2095633923000564","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Suspended particulate matter (SPM) in lakes exerts strong impact on light propagation, aquatic ecosystem productivity, which co-varies with nutrients, heavy metal and micro-pollutant in waters. In lakes, SPM exerts strong absorption and backscattering, ultimately affects water leaving signals that can be detected by satellite sensors. Simple regression models based on specific band or hand ratios have been widely used for SPM estimate in the past with moderate accuracy. There are still rooms for model accuracy improvements, and machine learning models may solve the non-linear relationships between spectral variable and SPM in waters. We assembled more than 16,400 in situ measured SPM in lakes from six continents (excluding the Antarctica continent), of which 9640 samples were matched with Landsat overpasses within ±7 days. Seven machine learning algorithms and two simple regression methods (linear and partial least squares models) were used to estimate SPM in lakes and the performance were compared. To overcome the problem of imbalance datasets in regression, a Synthetic Minority Over-Sampling technique for regression with Gaussian Noise (SMOGN) was adopted in this study. Through comparison, we found that gradient boosting decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) models demonstrated good spatiotemporal transferability with SMOGN processed dataset, and has potential to map SPM at different year with good quality of Landsat land surface reflectance images. In all the tested modeling approaches, the GBDT model has accurate calibration (n = 6428, R2 = 0.95, MAPE = 29.8%) from SPM collected in 2235 lakes across the world, and the validation (n = 3214, R2 = 0.84, MAPE = 38.8%) also exhibited stable performance. Further, the good performances were also exhibited by RF model with calibration (R2 = 0.93) and validation (R2 = 0.86, MAPE = 24.2%) datasets. We applied GBDT and RF models to map SPM of typical lakes, and satisfactory result was obtained. In addition, the GBDT model was evaluated by historical SPM measurements coincident with different Landsat sensors (L5-TM, L7-ETM+, and L8-OLI), thus the model has the potential to map SPM of lakes for monitoring temporal variations, and tracks lake water SPM dynamics in approximately the past four decades (1984–2021) since Landsat-5/TM was launched in 1984.
期刊介绍:
The International Soil and Water Conservation Research (ISWCR), the official journal of World Association of Soil and Water Conservation (WASWAC) http://www.waswac.org, is a multidisciplinary journal of soil and water conservation research, practice, policy, and perspectives. It aims to disseminate new knowledge and promote the practice of soil and water conservation.
The scope of International Soil and Water Conservation Research includes research, strategies, and technologies for prediction, prevention, and protection of soil and water resources. It deals with identification, characterization, and modeling; dynamic monitoring and evaluation; assessment and management of conservation practice and creation and implementation of quality standards.
Examples of appropriate topical areas include (but are not limited to):
• Conservation models, tools, and technologies
• Conservation agricultural
• Soil health resources, indicators, assessment, and management
• Land degradation
• Sustainable development
• Soil erosion and its control
• Soil erosion processes
• Water resources assessment and management
• Watershed management
• Soil erosion models
• Literature review on topics related soil and water conservation research