使用物理衍生的无量纲群的声化学系统的机器学习建模。

IF 9.7 1区化学 Q1 ACOUSTICS

Ultrasonics Sonochemistry Pub Date : 2025-09-30 DOI:10.1016/j.ultsonch.2025.107593

Yucheng Zhu , Ruosi Zhang , Xueliang Zhu , Xuhai Pan , Michael Short , Lian X. Liu , Madeleine J. Bussemaker

{"title":"使用物理衍生的无量纲群的声化学系统的机器学习建模。","authors":"Yucheng Zhu , Ruosi Zhang , Xueliang Zhu , Xuhai Pan , Michael Short , Lian X. Liu , Madeleine J. Bussemaker","doi":"10.1016/j.ultsonch.2025.107593","DOIUrl":null,"url":null,"abstract":"<div><div>Sonochemistry involves complex multiparametric effects and nonlinear interactions that challenge conventional analysis and modelling approaches, especially when extrapolating across systems. Current models mainly depend on dimensional input variables, limiting generalisability and interpretability. This work proposes a machine learning strategy that integrates physically derived dimensionless variables (<em>Π</em>-terms) into a categorical boosting (CatBoost) algorithm to overcome these limitations. Four representative sonochemical outputs, namely sonochemiluminescence (SCL) intensity, SCL area, and ultrasonic oxidation from iodide oxidation radicals (IORS) and both IORS and H<sub>2</sub>O<sub>2</sub>, were selected as model targets. Seven supervised learning algorithms, including k-nearest neighbours (KNN), linear regression, support vector regression (SVR), random forest, gradient boosting, extreme gradient boosting (XGBoost), and CatBoost, were evaluated, with tree-based models exhibiting superior performance. CatBoost was finally selected as the baseline model. Regression models using the same <em>Π</em>-terms achieved R<sup>2</sup> = 0.67–0.90 on the full dataset but required dataset-specific corrections to predict independent validation sets. However, the machine learning framework reached higher predictive accuracy (R<sup>2</sup> = 0.87––0.95 on the reserved test set) and generalised to external validation datasets without additional corrections. Furthermore, a direct comparison between dimensional and dimensionless input strategies showed that dimensionless-input models provided superior generalisability and task-to-task consistency, alleviating plateau effects observed in dimensional models and yielding more stable feature attributions. SHAP analysis highlighted variables associated with cavitation thermal buffering and energy input scaling (>50 % combined importance across tasks), offering mechanistic insights into these nonlinear behaviours that regression could not capture.</div></div>","PeriodicalId":442,"journal":{"name":"Ultrasonics Sonochemistry","volume":"122 ","pages":"Article 107593"},"PeriodicalIF":9.7000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning modelling of sonochemical systems using physically-derived dimensionless groups\",\"authors\":\"Yucheng Zhu , Ruosi Zhang , Xueliang Zhu , Xuhai Pan , Michael Short , Lian X. Liu , Madeleine J. Bussemaker\",\"doi\":\"10.1016/j.ultsonch.2025.107593\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Sonochemistry involves complex multiparametric effects and nonlinear interactions that challenge conventional analysis and modelling approaches, especially when extrapolating across systems. Current models mainly depend on dimensional input variables, limiting generalisability and interpretability. This work proposes a machine learning strategy that integrates physically derived dimensionless variables (<em>Π</em>-terms) into a categorical boosting (CatBoost) algorithm to overcome these limitations. Four representative sonochemical outputs, namely sonochemiluminescence (SCL) intensity, SCL area, and ultrasonic oxidation from iodide oxidation radicals (IORS) and both IORS and H<sub>2</sub>O<sub>2</sub>, were selected as model targets. Seven supervised learning algorithms, including k-nearest neighbours (KNN), linear regression, support vector regression (SVR), random forest, gradient boosting, extreme gradient boosting (XGBoost), and CatBoost, were evaluated, with tree-based models exhibiting superior performance. CatBoost was finally selected as the baseline model. Regression models using the same <em>Π</em>-terms achieved R<sup>2</sup> = 0.67–0.90 on the full dataset but required dataset-specific corrections to predict independent validation sets. However, the machine learning framework reached higher predictive accuracy (R<sup>2</sup> = 0.87––0.95 on the reserved test set) and generalised to external validation datasets without additional corrections. Furthermore, a direct comparison between dimensional and dimensionless input strategies showed that dimensionless-input models provided superior generalisability and task-to-task consistency, alleviating plateau effects observed in dimensional models and yielding more stable feature attributions. SHAP analysis highlighted variables associated with cavitation thermal buffering and energy input scaling (>50 % combined importance across tasks), offering mechanistic insights into these nonlinear behaviours that regression could not capture.</div></div>\",\"PeriodicalId\":442,\"journal\":{\"name\":\"Ultrasonics Sonochemistry\",\"volume\":\"122 \",\"pages\":\"Article 107593\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ultrasonics Sonochemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1350417725003724\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ultrasonics Sonochemistry","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350417725003724","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

声化学涉及复杂的多参数效应和非线性相互作用，这对传统的分析和建模方法提出了挑战，特别是在跨系统外推时。目前的模型主要依赖于维度输入变量，限制了通用性和可解释性。这项工作提出了一种机器学习策略，该策略将物理派生的无量纲变量（Π-terms）集成到分类提升（CatBoost）算法中，以克服这些限制。选择四个具有代表性的声化学输出，即声化学发光（SCL）强度、SCL面积和碘化物氧化自由基（ior）以及ior和H2O2的超声氧化，作为模型目标。7种监督学习算法，包括k近邻（KNN）、线性回归、支持向量回归（SVR）、随机森林、梯度增强、极端梯度增强（XGBoost）和CatBoost，被评估，其中基于树的模型表现出优越的性能。最终选择CatBoost作为基线模型。使用相同Π-terms的回归模型在完整数据集上实现了R2 = 0.67-0.90，但需要特定于数据集的修正来预测独立的验证集。然而，机器学习框架达到了更高的预测精度（在保留的测试集上R2 = 0.87—0.95），并且在没有额外修正的情况下推广到外部验证数据集。此外，对有量纲输入策略和无量纲输入策略的直接比较表明，无量纲输入模型具有更好的通用性和任务间一致性，减轻了有量纲输入模型的平台效应，产生了更稳定的特征归因。SHAP分析强调了与空化热缓冲和能量输入缩放相关的变量（在任务之间的总重要性为50%），为这些非线性行为提供了回归无法捕捉的机制见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine learning modelling of sonochemical systems using physically-derived dimensionless groups

Sonochemistry involves complex multiparametric effects and nonlinear interactions that challenge conventional analysis and modelling approaches, especially when extrapolating across systems. Current models mainly depend on dimensional input variables, limiting generalisability and interpretability. This work proposes a machine learning strategy that integrates physically derived dimensionless variables (Π-terms) into a categorical boosting (CatBoost) algorithm to overcome these limitations. Four representative sonochemical outputs, namely sonochemiluminescence (SCL) intensity, SCL area, and ultrasonic oxidation from iodide oxidation radicals (IORS) and both IORS and H₂O₂, were selected as model targets. Seven supervised learning algorithms, including k-nearest neighbours (KNN), linear regression, support vector regression (SVR), random forest, gradient boosting, extreme gradient boosting (XGBoost), and CatBoost, were evaluated, with tree-based models exhibiting superior performance. CatBoost was finally selected as the baseline model. Regression models using the same Π-terms achieved R² = 0.67–0.90 on the full dataset but required dataset-specific corrections to predict independent validation sets. However, the machine learning framework reached higher predictive accuracy (R² = 0.87––0.95 on the reserved test set) and generalised to external validation datasets without additional corrections. Furthermore, a direct comparison between dimensional and dimensionless input strategies showed that dimensionless-input models provided superior generalisability and task-to-task consistency, alleviating plateau effects observed in dimensional models and yielding more stable feature attributions. SHAP analysis highlighted variables associated with cavitation thermal buffering and energy input scaling (>50 % combined importance across tasks), offering mechanistic insights into these nonlinear behaviours that regression could not capture.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ultrasonics Sonochemistry 化学-化学综合

CiteScore

15.80

自引率

11.90%

发文量

361

审稿时长

59 days

期刊介绍： Ultrasonics Sonochemistry stands as a premier international journal dedicated to the publication of high-quality research articles primarily focusing on chemical reactions and reactors induced by ultrasonic waves, known as sonochemistry. Beyond chemical reactions, the journal also welcomes contributions related to cavitation-induced events and processing, including sonoluminescence, and the transformation of materials on chemical, physical, and biological levels. Since its inception in 1994, Ultrasonics Sonochemistry has consistently maintained a top ranking in the "Acoustics" category, reflecting its esteemed reputation in the field. The journal publishes exceptional papers covering various areas of ultrasonics and sonochemistry. Its contributions are highly regarded by both academia and industry stakeholders, demonstrating its relevance and impact in advancing research and innovation.