Iseul Na , Taeho Kim , Pengpeng Qiu , Younggyu Son
{"title":"Machine learning model to predict rate constants for sonochemical degradation of organic pollutants","authors":"Iseul Na , Taeho Kim , Pengpeng Qiu , Younggyu Son","doi":"10.1016/j.ultsonch.2024.107032","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, machine learning (ML) algorithms were employed to predict the pseudo-1st-order reaction rate constants for the sonochemical degradation of aqueous organic pollutants under various conditions. A total of 618 sets of data, including ultrasonic, solution, and pollutant characteristics, were collected from 89 previous studies. Considering the difference between the electrical power (P<sub>ele</sub>) and calorimetric power (P<sub>cal</sub>), the collected data were divided into two groups: data with P<sub>ele</sub> and data with P<sub>cal</sub>. Eight input variables, including frequency, power density, pH, temperature, initial concentration, solubility, vapor pressure, and octanol–water partition coefficient (K<sub>ow</sub>), and one target variable of the degradation rate constant, were selected for ML. Statistical analysis was conducted, and outliers were determined separately for the two groups. ML models, including random forest (RF), extreme gradient boosting (XGB), and light gradient boosting machine (LGB), were used to predict the pseudo-1st-order reaction rate constants for the removal of aqueous pollutants. The prediction performance of the ML models was evaluated using different metrics, including the root mean squared error (RMSE), mean absolute error (MAE), and R squared (R<sup>2</sup>). A significantly higher prediction performance was obtained using data without outliers and augmented data. Consequently, all the applied ML models could be used to predict the sonochemical degradation of aqueous pollutants, and the XGB model showed the highest accuracy in predicting the rate constants. In addition, the power density and frequency were the most influential factors among the eight input variables in prediction with the Shapley additive explanation (SHAP) values method. The degradation rate constants of the two pollutants over a wide frequency range (20–1,000 kHz) were predicted using the trained ML model (XGB) and the prediction results were analyzed.</p></div>","PeriodicalId":442,"journal":{"name":"Ultrasonics Sonochemistry","volume":null,"pages":null},"PeriodicalIF":8.7000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1350417724002803/pdfft?md5=7b3ab12f2de5f3e0859b1059c55662e3&pid=1-s2.0-S1350417724002803-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ultrasonics Sonochemistry","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350417724002803","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, machine learning (ML) algorithms were employed to predict the pseudo-1st-order reaction rate constants for the sonochemical degradation of aqueous organic pollutants under various conditions. A total of 618 sets of data, including ultrasonic, solution, and pollutant characteristics, were collected from 89 previous studies. Considering the difference between the electrical power (Pele) and calorimetric power (Pcal), the collected data were divided into two groups: data with Pele and data with Pcal. Eight input variables, including frequency, power density, pH, temperature, initial concentration, solubility, vapor pressure, and octanol–water partition coefficient (Kow), and one target variable of the degradation rate constant, were selected for ML. Statistical analysis was conducted, and outliers were determined separately for the two groups. ML models, including random forest (RF), extreme gradient boosting (XGB), and light gradient boosting machine (LGB), were used to predict the pseudo-1st-order reaction rate constants for the removal of aqueous pollutants. The prediction performance of the ML models was evaluated using different metrics, including the root mean squared error (RMSE), mean absolute error (MAE), and R squared (R2). A significantly higher prediction performance was obtained using data without outliers and augmented data. Consequently, all the applied ML models could be used to predict the sonochemical degradation of aqueous pollutants, and the XGB model showed the highest accuracy in predicting the rate constants. In addition, the power density and frequency were the most influential factors among the eight input variables in prediction with the Shapley additive explanation (SHAP) values method. The degradation rate constants of the two pollutants over a wide frequency range (20–1,000 kHz) were predicted using the trained ML model (XGB) and the prediction results were analyzed.
期刊介绍:
Ultrasonics Sonochemistry stands as a premier international journal dedicated to the publication of high-quality research articles primarily focusing on chemical reactions and reactors induced by ultrasonic waves, known as sonochemistry. Beyond chemical reactions, the journal also welcomes contributions related to cavitation-induced events and processing, including sonoluminescence, and the transformation of materials on chemical, physical, and biological levels.
Since its inception in 1994, Ultrasonics Sonochemistry has consistently maintained a top ranking in the "Acoustics" category, reflecting its esteemed reputation in the field. The journal publishes exceptional papers covering various areas of ultrasonics and sonochemistry. Its contributions are highly regarded by both academia and industry stakeholders, demonstrating its relevance and impact in advancing research and innovation.