Yucheng Zhu , Ruosi Zhang , Xueliang Zhu , Xuhai Pan , Michael Short , Lian X. Liu , Madeleine J. Bussemaker
{"title":"Machine learning modelling of sonochemical systems using physically-derived dimensionless groups","authors":"Yucheng Zhu , Ruosi Zhang , Xueliang Zhu , Xuhai Pan , Michael Short , Lian X. Liu , Madeleine J. Bussemaker","doi":"10.1016/j.ultsonch.2025.107593","DOIUrl":null,"url":null,"abstract":"<div><div>Sonochemistry involves complex multiparametric effects and nonlinear interactions that challenge conventional analysis and modelling approaches, especially when extrapolating across systems. Current models mainly depend on dimensional input variables, limiting generalisability and interpretability. This work proposes a machine learning strategy that integrates physically derived dimensionless variables (<em>Π</em>-terms) into a categorical boosting (CatBoost) algorithm to overcome these limitations. Four representative sonochemical outputs, namely sonochemiluminescence (SCL) intensity, SCL area, and ultrasonic oxidation from iodide oxidation radicals (IORS) and both IORS and H<sub>2</sub>O<sub>2</sub>, were selected as model targets. Seven supervised learning algorithms, including k-nearest neighbours (KNN), linear regression, support vector regression (SVR), random forest, gradient boosting, extreme gradient boosting (XGBoost), and CatBoost, were evaluated, with tree-based models exhibiting superior performance. CatBoost was finally selected as the baseline model. Regression models using the same <em>Π</em>-terms achieved R<sup>2</sup> = 0.67–0.90 on the full dataset but required dataset-specific corrections to predict independent validation sets. However, the machine learning framework reached higher predictive accuracy (R<sup>2</sup> = 0.87––0.95 on the reserved test set) and generalised to external validation datasets without additional corrections. Furthermore, a direct comparison between dimensional and dimensionless input strategies showed that dimensionless-input models provided superior generalisability and task-to-task consistency, alleviating plateau effects observed in dimensional models and yielding more stable feature attributions. SHAP analysis highlighted variables associated with cavitation thermal buffering and energy input scaling (>50 % combined importance across tasks), offering mechanistic insights into these nonlinear behaviours that regression could not capture.</div></div>","PeriodicalId":442,"journal":{"name":"Ultrasonics Sonochemistry","volume":"122 ","pages":"Article 107593"},"PeriodicalIF":9.7000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ultrasonics Sonochemistry","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350417725003724","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Sonochemistry involves complex multiparametric effects and nonlinear interactions that challenge conventional analysis and modelling approaches, especially when extrapolating across systems. Current models mainly depend on dimensional input variables, limiting generalisability and interpretability. This work proposes a machine learning strategy that integrates physically derived dimensionless variables (Π-terms) into a categorical boosting (CatBoost) algorithm to overcome these limitations. Four representative sonochemical outputs, namely sonochemiluminescence (SCL) intensity, SCL area, and ultrasonic oxidation from iodide oxidation radicals (IORS) and both IORS and H2O2, were selected as model targets. Seven supervised learning algorithms, including k-nearest neighbours (KNN), linear regression, support vector regression (SVR), random forest, gradient boosting, extreme gradient boosting (XGBoost), and CatBoost, were evaluated, with tree-based models exhibiting superior performance. CatBoost was finally selected as the baseline model. Regression models using the same Π-terms achieved R2 = 0.67–0.90 on the full dataset but required dataset-specific corrections to predict independent validation sets. However, the machine learning framework reached higher predictive accuracy (R2 = 0.87––0.95 on the reserved test set) and generalised to external validation datasets without additional corrections. Furthermore, a direct comparison between dimensional and dimensionless input strategies showed that dimensionless-input models provided superior generalisability and task-to-task consistency, alleviating plateau effects observed in dimensional models and yielding more stable feature attributions. SHAP analysis highlighted variables associated with cavitation thermal buffering and energy input scaling (>50 % combined importance across tasks), offering mechanistic insights into these nonlinear behaviours that regression could not capture.
期刊介绍:
Ultrasonics Sonochemistry stands as a premier international journal dedicated to the publication of high-quality research articles primarily focusing on chemical reactions and reactors induced by ultrasonic waves, known as sonochemistry. Beyond chemical reactions, the journal also welcomes contributions related to cavitation-induced events and processing, including sonoluminescence, and the transformation of materials on chemical, physical, and biological levels.
Since its inception in 1994, Ultrasonics Sonochemistry has consistently maintained a top ranking in the "Acoustics" category, reflecting its esteemed reputation in the field. The journal publishes exceptional papers covering various areas of ultrasonics and sonochemistry. Its contributions are highly regarded by both academia and industry stakeholders, demonstrating its relevance and impact in advancing research and innovation.