{"title":"Identifying Predictors of Spatiotemporal Variations in Residential Radon Concentrations across North Carolina Using Machine Learning Analytics","authors":"Zhenchun Yang, Lauren Prox, Clare Meernik, Yadurshini Raveendran, David Press, Phillip Gibson, Amie Koch, Olufemi Ajumobi, Jeffrey Clarke, Ruoxue Chen, Junfeng (Jim) Zhang, Tomi Akinyemiju","doi":"10.1016/j.envpol.2024.125592","DOIUrl":null,"url":null,"abstract":"Radon is a naturally occurring radioactive gas derived from the decay of uranium in the Earth’s crust. Radon exposure is the leading cause of lung cancer among non-smokers in the US. Radon infiltrates homes through soil and building foundations. This study advances methodologies for assessing residential radon exposure by leveraging a comprehensive dataset of 126,382 short-term (2-7 days) radon test results collected across North Carolina from 2010 to 2020. Employing a combination of linear regression and advanced machine learning techniques, including random forest models.Analysis through linear regression, linear mixed-effects models (LME), and generalized additive models (GAM) using the first-time tested radon levels reveals that elevation, proximity to geological faults, and soil moisture are pivotal in determining radon concentration. Specifically, elevation consistently shows a positive relationship with radon levels across models (linear regression: β=0.12, p<0.001; LME: β=0.17, p<0.001; GAM: β=0.11, p<0.001). Conversely, the distance to geological faults negatively correlates with radon concentration (linear regression: β=-0.11, p<0.001; LME: β=-0.06, p<0.001; GAM: β=-0.07, p<0.001), indicating lower radon levels further from faults.Using the random forest model, our study identifies the most influential environmental predictors of first-time tested radon levels. Elevation is the most influential variable, followed by median instantaneous surface pressure and soil moisture in the upper 10 cm layer, illustrating the significant role of geological and immediate surface conditions. Additional important factors include precipitation, mean temperature, and deeper soil moisture levels (40-200 cm), which underscores the influence of climate on radon variability. Root zone soil moisture and the Normalized Difference Vegetation Index (NDVI) also contribute to predicting radon levels, reflecting the importance of soil and vegetation dynamics in radon emanation. By integrating multiple statistical models, this research provides a nuanced understanding of the predictors of radon concentration, enhancing predictive accuracy and reliability.","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"116 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.envpol.2024.125592","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Radon is a naturally occurring radioactive gas derived from the decay of uranium in the Earth’s crust. Radon exposure is the leading cause of lung cancer among non-smokers in the US. Radon infiltrates homes through soil and building foundations. This study advances methodologies for assessing residential radon exposure by leveraging a comprehensive dataset of 126,382 short-term (2-7 days) radon test results collected across North Carolina from 2010 to 2020. Employing a combination of linear regression and advanced machine learning techniques, including random forest models.Analysis through linear regression, linear mixed-effects models (LME), and generalized additive models (GAM) using the first-time tested radon levels reveals that elevation, proximity to geological faults, and soil moisture are pivotal in determining radon concentration. Specifically, elevation consistently shows a positive relationship with radon levels across models (linear regression: β=0.12, p<0.001; LME: β=0.17, p<0.001; GAM: β=0.11, p<0.001). Conversely, the distance to geological faults negatively correlates with radon concentration (linear regression: β=-0.11, p<0.001; LME: β=-0.06, p<0.001; GAM: β=-0.07, p<0.001), indicating lower radon levels further from faults.Using the random forest model, our study identifies the most influential environmental predictors of first-time tested radon levels. Elevation is the most influential variable, followed by median instantaneous surface pressure and soil moisture in the upper 10 cm layer, illustrating the significant role of geological and immediate surface conditions. Additional important factors include precipitation, mean temperature, and deeper soil moisture levels (40-200 cm), which underscores the influence of climate on radon variability. Root zone soil moisture and the Normalized Difference Vegetation Index (NDVI) also contribute to predicting radon levels, reflecting the importance of soil and vegetation dynamics in radon emanation. By integrating multiple statistical models, this research provides a nuanced understanding of the predictors of radon concentration, enhancing predictive accuracy and reliability.
期刊介绍:
Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health.
Subject areas include, but are not limited to:
• Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies;
• Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change;
• Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects;
• Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects;
• Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest;
• New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.