Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, Xia Meng
{"title":"基于机器学习模型的 2005-2020 年中国 10 公里日尺度紫外线辐射预测数据集","authors":"Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, Xia Meng","doi":"10.5194/essd-2024-111","DOIUrl":null,"url":null,"abstract":"<strong>Abstract.</strong> Ultraviolet (UV) radiation is closely related to health, but limited measurements hindered further investigation of its health effects in China. Machine learning algorithm has been widely used in predicting environmental factors with high accuracy, but limited studies have done for UV radiation. This study aimed to develop UV radiation prediction model based on random forest method, and predict UV radiation at daily level and 10 km resolution in mainland China in 2005–2020. A random forest model was employed to predict UV radiation by integrating ground UV radiation measurements from monitoring stations and multiple predictors, such as UV radiation data from satellite. Missing data of satellite-based UV radiation was filled by three-day moving average method. The model's performance was evaluated through multiple cross-validation (CV) methods. The overall R<sup>2</sup> (root mean square error, RMSE) between measured and predicted UV radiation from model development and model 10-fold CV was 0.97 (15.64 W m<sup>-2</sup>) and 0.83 (37.44 W m<sup>-2</sup>) at daily level, respectively. The model with OMI EDD performed higher predicting accuracy than the one without it. Based on predictions of UV radiation at daily level and 10 km spatial resolution and nearly 100 % spatiotemporal coverage, we found UV radiation increased by 4.20 % while PM<sub>2.5</sub> levels decreased by 48.51 % and O<sub>3</sub> levels rose by 22.70 % in 2013–2020, suggesting a potential correlation among these environmental factors. Uneven spatial distribution of UV radiation was found to be associated with factors such as latitude, elevation, meteorological factors and seasons. The eastern areas of China posed higher risk with both high population density and UV radiation intensity. Based on machine learning algorithm, this study generated a gridded dataset characterized by relatively high precision and extensive spatiotemporal coverage of UV radiation, which demonstrates the spatiotemporal variability of UV radiation levels in China and can facilitate health-related research in the future. This dataset is currently freely available at https://doi.org/10.5281/zenodo.10884591 (Jiang et al., 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":null,"pages":null},"PeriodicalIF":11.2000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A 10 km daily-level ultraviolet radiation predicting dataset based on machine learning models in China from 2005 to 2020\",\"authors\":\"Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, Xia Meng\",\"doi\":\"10.5194/essd-2024-111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<strong>Abstract.</strong> Ultraviolet (UV) radiation is closely related to health, but limited measurements hindered further investigation of its health effects in China. Machine learning algorithm has been widely used in predicting environmental factors with high accuracy, but limited studies have done for UV radiation. This study aimed to develop UV radiation prediction model based on random forest method, and predict UV radiation at daily level and 10 km resolution in mainland China in 2005–2020. A random forest model was employed to predict UV radiation by integrating ground UV radiation measurements from monitoring stations and multiple predictors, such as UV radiation data from satellite. Missing data of satellite-based UV radiation was filled by three-day moving average method. The model's performance was evaluated through multiple cross-validation (CV) methods. The overall R<sup>2</sup> (root mean square error, RMSE) between measured and predicted UV radiation from model development and model 10-fold CV was 0.97 (15.64 W m<sup>-2</sup>) and 0.83 (37.44 W m<sup>-2</sup>) at daily level, respectively. The model with OMI EDD performed higher predicting accuracy than the one without it. Based on predictions of UV radiation at daily level and 10 km spatial resolution and nearly 100 % spatiotemporal coverage, we found UV radiation increased by 4.20 % while PM<sub>2.5</sub> levels decreased by 48.51 % and O<sub>3</sub> levels rose by 22.70 % in 2013–2020, suggesting a potential correlation among these environmental factors. Uneven spatial distribution of UV radiation was found to be associated with factors such as latitude, elevation, meteorological factors and seasons. The eastern areas of China posed higher risk with both high population density and UV radiation intensity. Based on machine learning algorithm, this study generated a gridded dataset characterized by relatively high precision and extensive spatiotemporal coverage of UV radiation, which demonstrates the spatiotemporal variability of UV radiation levels in China and can facilitate health-related research in the future. This dataset is currently freely available at https://doi.org/10.5281/zenodo.10884591 (Jiang et al., 2024).\",\"PeriodicalId\":48747,\"journal\":{\"name\":\"Earth System Science Data\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.2000,\"publicationDate\":\"2024-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Earth System Science Data\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.5194/essd-2024-111\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-2024-111","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
摘要。紫外线(UV)辐射与健康密切相关,但在中国,有限的测量数据阻碍了对其健康影响的进一步研究。机器学习算法已被广泛应用于环境因素的高精度预测,但针对紫外线辐射的研究还很有限。本研究旨在开发基于随机森林方法的紫外线辐射预测模型,并预测 2005-2020 年中国大陆日水平和 10 km 分辨率的紫外线辐射。研究采用随机森林模型,综合监测站的地面紫外辐射测量数据和卫星紫外辐射数据等多个预测因子,对紫外辐射进行预测。卫星紫外辐射缺失数据采用三天移动平均法进行填补。模型的性能通过多种交叉验证(CV)方法进行评估。模型开发和模型 10 倍交叉验证得出的紫外辐射测量值与预测值之间的总 R2(均方根误差,RMSE)分别为 0.97(15.64 W m-2)和 0.83(37.44 W m-2)。采用 OMI EDD 的模型比不采用 OMI EDD 的模型预测精度更高。基于日紫外线辐射预测和 10 千米空间分辨率以及近 100%的时空覆盖率,我们发现 2013-2020 年紫外线辐射增加了 4.20%,而 PM2.5 水平下降了 48.51%,O3 水平上升了 22.70%,这表明这些环境因素之间存在潜在的相关性。研究发现,紫外线辐射的不均匀空间分布与纬度、海拔、气象因素和季节等因素有关。中国东部地区人口密度高,紫外线辐射强度大,因此风险较高。基于机器学习算法,本研究生成了一个网格数据集,该数据集具有精度相对较高、紫外线辐射时空覆盖面广的特点,展示了中国紫外线辐射水平的时空变异性,有助于未来开展与健康相关的研究。该数据集目前可在 https://doi.org/10.5281/zenodo.10884591 免费获取(Jiang 等,2024 年)。
A 10 km daily-level ultraviolet radiation predicting dataset based on machine learning models in China from 2005 to 2020
Abstract. Ultraviolet (UV) radiation is closely related to health, but limited measurements hindered further investigation of its health effects in China. Machine learning algorithm has been widely used in predicting environmental factors with high accuracy, but limited studies have done for UV radiation. This study aimed to develop UV radiation prediction model based on random forest method, and predict UV radiation at daily level and 10 km resolution in mainland China in 2005–2020. A random forest model was employed to predict UV radiation by integrating ground UV radiation measurements from monitoring stations and multiple predictors, such as UV radiation data from satellite. Missing data of satellite-based UV radiation was filled by three-day moving average method. The model's performance was evaluated through multiple cross-validation (CV) methods. The overall R2 (root mean square error, RMSE) between measured and predicted UV radiation from model development and model 10-fold CV was 0.97 (15.64 W m-2) and 0.83 (37.44 W m-2) at daily level, respectively. The model with OMI EDD performed higher predicting accuracy than the one without it. Based on predictions of UV radiation at daily level and 10 km spatial resolution and nearly 100 % spatiotemporal coverage, we found UV radiation increased by 4.20 % while PM2.5 levels decreased by 48.51 % and O3 levels rose by 22.70 % in 2013–2020, suggesting a potential correlation among these environmental factors. Uneven spatial distribution of UV radiation was found to be associated with factors such as latitude, elevation, meteorological factors and seasons. The eastern areas of China posed higher risk with both high population density and UV radiation intensity. Based on machine learning algorithm, this study generated a gridded dataset characterized by relatively high precision and extensive spatiotemporal coverage of UV radiation, which demonstrates the spatiotemporal variability of UV radiation levels in China and can facilitate health-related research in the future. This dataset is currently freely available at https://doi.org/10.5281/zenodo.10884591 (Jiang et al., 2024).
Earth System Science DataGEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
18.00
自引率
5.30%
发文量
231
审稿时长
35 weeks
期刊介绍:
Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.