{"title":"基于机器学习-SHAP 算法的全球海洋溶解镉分布模型。","authors":"Ziyuan Jiang, Enhui Liao, Ziang Li, Ruifeng Zhang","doi":"10.1016/j.scitotenv.2024.177951","DOIUrl":null,"url":null,"abstract":"<p><p>Cadmium (Cd) is a bio-essential trace metal in the ocean that can be toxic at high concentrations, significantly impacting the marine environment and phytoplankton growth. Its distribution pattern is closely proportional to that of phosphate (PO<sub>4</sub>), although the mechanism is not fully understood. At low concentrations, evidence indicates Cd is able to act as an enzyme cofactor in biological processes. An understanding of the spatial distribution of dissolved cadmium (dCd) remains lacking and is constrained by the limitations of current observational data. Based on the observational data, this study applied advanced machine learning methods to reconstruct a global dataset of dCd, aiming to improve the accuracy and comprehensiveness of dCd cycling analyses. A comparison of five machine learning algorithms (artificial neural network, support vector machine, Lasso regression, k-nearest neighbors, and random forest) found that the random forest model showed the best performance (Rsq = 0.99, RMSE = 0.035 nmol kg<sup>-1</sup>, MAE = 0.019 nmol kg<sup>-1</sup>, MAPE = 0.345), reducing bias by 25 % compared to previous studies. Using SHapley Additive exPlanations approach (SHAP), this study explored the factors influencing the dCd distribution at various depths and discussed the potential causes of changes in the Cd-PO<sub>4</sub> relationship. The results showed that the temporal and spatial variability of Cd was influenced by surface biological processes, deep-sea mineralization, and seawater stratification. Variations in the Cd-PO<sub>4</sub> relationship were linked to differences in biological fractionation inside and outside high-nutrient, low-chlorophyll (HNLC) regions, as well as the mixing of water masses with different Cd:PO<sub>4</sub> ratios. Further analysis indicated that >80 % of particles degraded into Cd and PO<sub>4</sub> were produced in HNLC regions. This study highlights the broad potential of machine learning in oceanography, offering a global perspective on Cd cycling and new insights into the mechanisms driving element cycling.</p>","PeriodicalId":422,"journal":{"name":"Science of the Total Environment","volume":"958 ","pages":"177951"},"PeriodicalIF":8.2000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modeling the global ocean distribution of dissolved cadmium based on machine learning-SHAP algorithm.\",\"authors\":\"Ziyuan Jiang, Enhui Liao, Ziang Li, Ruifeng Zhang\",\"doi\":\"10.1016/j.scitotenv.2024.177951\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Cadmium (Cd) is a bio-essential trace metal in the ocean that can be toxic at high concentrations, significantly impacting the marine environment and phytoplankton growth. Its distribution pattern is closely proportional to that of phosphate (PO<sub>4</sub>), although the mechanism is not fully understood. At low concentrations, evidence indicates Cd is able to act as an enzyme cofactor in biological processes. An understanding of the spatial distribution of dissolved cadmium (dCd) remains lacking and is constrained by the limitations of current observational data. Based on the observational data, this study applied advanced machine learning methods to reconstruct a global dataset of dCd, aiming to improve the accuracy and comprehensiveness of dCd cycling analyses. A comparison of five machine learning algorithms (artificial neural network, support vector machine, Lasso regression, k-nearest neighbors, and random forest) found that the random forest model showed the best performance (Rsq = 0.99, RMSE = 0.035 nmol kg<sup>-1</sup>, MAE = 0.019 nmol kg<sup>-1</sup>, MAPE = 0.345), reducing bias by 25 % compared to previous studies. Using SHapley Additive exPlanations approach (SHAP), this study explored the factors influencing the dCd distribution at various depths and discussed the potential causes of changes in the Cd-PO<sub>4</sub> relationship. The results showed that the temporal and spatial variability of Cd was influenced by surface biological processes, deep-sea mineralization, and seawater stratification. Variations in the Cd-PO<sub>4</sub> relationship were linked to differences in biological fractionation inside and outside high-nutrient, low-chlorophyll (HNLC) regions, as well as the mixing of water masses with different Cd:PO<sub>4</sub> ratios. Further analysis indicated that >80 % of particles degraded into Cd and PO<sub>4</sub> were produced in HNLC regions. This study highlights the broad potential of machine learning in oceanography, offering a global perspective on Cd cycling and new insights into the mechanisms driving element cycling.</p>\",\"PeriodicalId\":422,\"journal\":{\"name\":\"Science of the Total Environment\",\"volume\":\"958 \",\"pages\":\"177951\"},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science of the Total Environment\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1016/j.scitotenv.2024.177951\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of the Total Environment","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.scitotenv.2024.177951","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Modeling the global ocean distribution of dissolved cadmium based on machine learning-SHAP algorithm.
Cadmium (Cd) is a bio-essential trace metal in the ocean that can be toxic at high concentrations, significantly impacting the marine environment and phytoplankton growth. Its distribution pattern is closely proportional to that of phosphate (PO4), although the mechanism is not fully understood. At low concentrations, evidence indicates Cd is able to act as an enzyme cofactor in biological processes. An understanding of the spatial distribution of dissolved cadmium (dCd) remains lacking and is constrained by the limitations of current observational data. Based on the observational data, this study applied advanced machine learning methods to reconstruct a global dataset of dCd, aiming to improve the accuracy and comprehensiveness of dCd cycling analyses. A comparison of five machine learning algorithms (artificial neural network, support vector machine, Lasso regression, k-nearest neighbors, and random forest) found that the random forest model showed the best performance (Rsq = 0.99, RMSE = 0.035 nmol kg-1, MAE = 0.019 nmol kg-1, MAPE = 0.345), reducing bias by 25 % compared to previous studies. Using SHapley Additive exPlanations approach (SHAP), this study explored the factors influencing the dCd distribution at various depths and discussed the potential causes of changes in the Cd-PO4 relationship. The results showed that the temporal and spatial variability of Cd was influenced by surface biological processes, deep-sea mineralization, and seawater stratification. Variations in the Cd-PO4 relationship were linked to differences in biological fractionation inside and outside high-nutrient, low-chlorophyll (HNLC) regions, as well as the mixing of water masses with different Cd:PO4 ratios. Further analysis indicated that >80 % of particles degraded into Cd and PO4 were produced in HNLC regions. This study highlights the broad potential of machine learning in oceanography, offering a global perspective on Cd cycling and new insights into the mechanisms driving element cycling.
期刊介绍:
The Science of the Total Environment is an international journal dedicated to scientific research on the environment and its interaction with humanity. It covers a wide range of disciplines and seeks to publish innovative, hypothesis-driven, and impactful research that explores the entire environment, including the atmosphere, lithosphere, hydrosphere, biosphere, and anthroposphere.
The journal's updated Aims & Scope emphasizes the importance of interdisciplinary environmental research with broad impact. Priority is given to studies that advance fundamental understanding and explore the interconnectedness of multiple environmental spheres. Field studies are preferred, while laboratory experiments must demonstrate significant methodological advancements or mechanistic insights with direct relevance to the environment.