Assessing the robustness and generalizability of machine learning models for predicting selenium content in rice: a case study from the Pearl River Delta and Eastern Guangdong, China.

IF 3.8 3区 环境科学与生态学 Q3 ENGINEERING, ENVIRONMENTAL
Guiqi Ye, Tingting Li, Wenda Geng, Kun Qian, Xudong Ma, Qingye Hou, Tao Yu, Zhongfang Yang, Xin Zhu
{"title":"Assessing the robustness and generalizability of machine learning models for predicting selenium content in rice: a case study from the Pearl River Delta and Eastern Guangdong, China.","authors":"Guiqi Ye, Tingting Li, Wenda Geng, Kun Qian, Xudong Ma, Qingye Hou, Tao Yu, Zhongfang Yang, Xin Zhu","doi":"10.1007/s10653-025-02681-9","DOIUrl":null,"url":null,"abstract":"<p><p>Crop selenium uptake, influenced by complex factors, has prompted extensive research to predict the Se content in crop grains, leading to the development of various prediction methods. However, the practical application of these models is limited by geographical constraints and variations in independent variables. This study selected two distinct regions in Guangdong Province, China: the Pearl River Delta (PRD), a Quaternary plain region, and Heyuan, a hilly region characterized by outcrops of clastic rocks. A total of 205 paired rice and rhizosphere soil samples (PRD: 2016) and 60 paired samples (Heyuan: 2023) were collected to assess model robustness and generalizability. The results showed that 82.93% and 30.00% of soil Se ≥ 0.40 mg/kg and 72.68% and 38.33% of rice grain Se content ≥ 0.04 mg/kg were found in the PRD and Heyuan, respectively. However, no significant positive correlation was observed between soil Se and rice grain Se content in either area. Further studies found that the main influencing factors of rice grain Se content were soil SiO<sub>2</sub>, Al<sub>2</sub>O<sub>3</sub>, total organic carbon (TOC), S, and pH. The model was applied to the dataset for both time periods separately, yielded strong results, indicating that the model is robust and does not fluctuate greatly with the time of sample collection. The five feature subsets were used to predict the two regions separately with significant results. This indicates that the subset of predictive model features is highly generalizable, and the differences in the lithology of the soil parent materials and topography do not significantly affect the prediction results.</p>","PeriodicalId":11759,"journal":{"name":"Environmental Geochemistry and Health","volume":"47 9","pages":"382"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Geochemistry and Health","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s10653-025-02681-9","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Crop selenium uptake, influenced by complex factors, has prompted extensive research to predict the Se content in crop grains, leading to the development of various prediction methods. However, the practical application of these models is limited by geographical constraints and variations in independent variables. This study selected two distinct regions in Guangdong Province, China: the Pearl River Delta (PRD), a Quaternary plain region, and Heyuan, a hilly region characterized by outcrops of clastic rocks. A total of 205 paired rice and rhizosphere soil samples (PRD: 2016) and 60 paired samples (Heyuan: 2023) were collected to assess model robustness and generalizability. The results showed that 82.93% and 30.00% of soil Se ≥ 0.40 mg/kg and 72.68% and 38.33% of rice grain Se content ≥ 0.04 mg/kg were found in the PRD and Heyuan, respectively. However, no significant positive correlation was observed between soil Se and rice grain Se content in either area. Further studies found that the main influencing factors of rice grain Se content were soil SiO2, Al2O3, total organic carbon (TOC), S, and pH. The model was applied to the dataset for both time periods separately, yielded strong results, indicating that the model is robust and does not fluctuate greatly with the time of sample collection. The five feature subsets were used to predict the two regions separately with significant results. This indicates that the subset of predictive model features is highly generalizable, and the differences in the lithology of the soil parent materials and topography do not significantly affect the prediction results.

评估预测水稻硒含量的机器学习模型的稳健性和泛化性:以珠江三角洲和广东东部为例
作物硒吸收受复杂因素的影响,对作物籽粒硒含量的预测研究越来越广泛,各种预测方法也应运而生。然而,这些模型的实际应用受到地理限制和自变量的变化的限制。本研究选择两个截然不同的地区在广东省,中国:珠江三角洲(PRD)第四纪平原地区,河源市,丘陵地区露头的岩石碎屑的特征。共收集205个水稻和根际土壤配对样本(珠江三角洲:2016年)和60个配对样本(河源:2023年)来评估模型的稳健性和泛化性。结果表明,珠江三角洲和河源地区土壤硒含量≥0.40 mg/kg的比例分别为82.93%和30.00%,稻米硒含量≥0.04 mg/kg的比例分别为72.68%和38.33%。土壤硒含量与稻米硒含量均无显著正相关。进一步研究发现,水稻Se含量的主要影响因素是土壤SiO2、Al2O3、总有机碳(TOC)、S和ph。将该模型分别应用于两个时间段的数据集,得到了较强的结果,表明该模型具有鲁棒性,并且随着采样时间的变化波动不大。将五个特征子集分别用于两个区域的预测,结果显著。这表明预测模型特征子集具有高度的通用性,土壤母质岩性和地形的差异对预测结果影响不大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Geochemistry and Health
Environmental Geochemistry and Health 环境科学-工程:环境
CiteScore
8.00
自引率
4.80%
发文量
279
审稿时长
4.2 months
期刊介绍: Environmental Geochemistry and Health publishes original research papers and review papers across the broad field of environmental geochemistry. Environmental geochemistry and health establishes and explains links between the natural or disturbed chemical composition of the earth’s surface and the health of plants, animals and people. Beneficial elements regulate or promote enzymatic and hormonal activity whereas other elements may be toxic. Bedrock geochemistry controls the composition of soil and hence that of water and vegetation. Environmental issues, such as pollution, arising from the extraction and use of mineral resources, are discussed. The effects of contaminants introduced into the earth’s geochemical systems are examined. Geochemical surveys of soil, water and plants show how major and trace elements are distributed geographically. Associated epidemiological studies reveal the possibility of causal links between the natural or disturbed geochemical environment and disease. Experimental research illuminates the nature or consequences of natural or disturbed geochemical processes. The journal particularly welcomes novel research linking environmental geochemistry and health issues on such topics as: heavy metals (including mercury), persistent organic pollutants (POPs), and mixed chemicals emitted through human activities, such as uncontrolled recycling of electronic-waste; waste recycling; surface-atmospheric interaction processes (natural and anthropogenic emissions, vertical transport, deposition, and physical-chemical interaction) of gases and aerosols; phytoremediation/restoration of contaminated sites; food contamination and safety; environmental effects of medicines; effects and toxicity of mixed pollutants; speciation of heavy metals/metalloids; effects of mining; disturbed geochemistry from human behavior, natural or man-made hazards; particle and nanoparticle toxicology; risk and the vulnerability of populations, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信