{"title":"BORF: A Bayesian optimized random forest for prediction of aerosol extinction coefficient from Mie Lidar signal","authors":"Hao Chen , Fei Gao , Zhimin Rao , Dengxin Hua","doi":"10.1016/j.asoc.2025.113130","DOIUrl":null,"url":null,"abstract":"<div><div>In continuous observation signals of lidar, the identification and selection of effective signals are crucial, especially for the aerosol extinction coefficient retrieval. In this study, the Bayesian Optimized Random Forest (BORF) model, a machine learning approach combining Random Forest regression with Bayesian optimization, was developed for predicting aerosol extinction coefficients. Built upon the foundation of the Random Forest (RF) regression method, this model leverages Bayesian optimization to adjust model parameters precisely, significantly enhancing the accuracy of aerosol extinction coefficient predictions. This approach offers a valuable means to identify and screen anomalous Lidar signals. We constructed a training dataset comprising continuously observed Mie Lidar signals and aerosol extinction coefficients retrieved using the Klett method. The dataset contains dimensions, including Mie Lidar signals, detection time, detection distance, pressure, and temperature. This paper provides a detailed description of the BORF model’s establishment process and the optimization of model parameters using Bayesian optimization. Through model assessments, significance tests, and comparative experiments, we demonstrate the effectiveness of the BORF model. Experimental results indicate that, compared to other relevant models, the BORF model excels in predicting aerosol extinction coefficients, closely aligning with the accuracy of the Klett method. Specifically, in datasets with better data quality, the BORF model exhibits an approximately 4% increase in <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> compared to the RF and BP neural network optimized by genetic algorithm (BPGA), accompanied by a 41% to 47% reduction in MSE and MAE. The Mean Squared Error (MSE) and Mean Absolute Error (MAE) decrease by approximately 40% to 90% in datasets with lower data quality and less apparent data variations. This study provides a robust technical solution to ensure the reliability of Lidar data, thereby contributing to an enhanced understanding of atmospheric aerosols and environmental monitoring.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"176 ","pages":"Article 113130"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625004417","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In continuous observation signals of lidar, the identification and selection of effective signals are crucial, especially for the aerosol extinction coefficient retrieval. In this study, the Bayesian Optimized Random Forest (BORF) model, a machine learning approach combining Random Forest regression with Bayesian optimization, was developed for predicting aerosol extinction coefficients. Built upon the foundation of the Random Forest (RF) regression method, this model leverages Bayesian optimization to adjust model parameters precisely, significantly enhancing the accuracy of aerosol extinction coefficient predictions. This approach offers a valuable means to identify and screen anomalous Lidar signals. We constructed a training dataset comprising continuously observed Mie Lidar signals and aerosol extinction coefficients retrieved using the Klett method. The dataset contains dimensions, including Mie Lidar signals, detection time, detection distance, pressure, and temperature. This paper provides a detailed description of the BORF model’s establishment process and the optimization of model parameters using Bayesian optimization. Through model assessments, significance tests, and comparative experiments, we demonstrate the effectiveness of the BORF model. Experimental results indicate that, compared to other relevant models, the BORF model excels in predicting aerosol extinction coefficients, closely aligning with the accuracy of the Klett method. Specifically, in datasets with better data quality, the BORF model exhibits an approximately 4% increase in compared to the RF and BP neural network optimized by genetic algorithm (BPGA), accompanied by a 41% to 47% reduction in MSE and MAE. The Mean Squared Error (MSE) and Mean Absolute Error (MAE) decrease by approximately 40% to 90% in datasets with lower data quality and less apparent data variations. This study provides a robust technical solution to ensure the reliability of Lidar data, thereby contributing to an enhanced understanding of atmospheric aerosols and environmental monitoring.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.