{"title":"利用带数据增强功能的元集合学习框架加强离子液体毒性预测","authors":"Safa Sadaghiyanfam , Hiqmet Kamberaj , Yalcin Isler","doi":"10.1016/j.aichem.2025.100087","DOIUrl":null,"url":null,"abstract":"<div><div>Ionic liquids are unique in their properties and potential to be green solvents. Still, the toxicity concern remains, compelling the need for excellent predictive models for safe design and application. This work reports the introduction of a general, robust meta-ensemble learning framework for predicting the toxicity of ionic liquids using molecular descriptors and fingerprints. The proposed model incorporates the Random Forest, Support Vector Regression, Categorical Boosting, Chemical Convolutional Neural Network as a base classifier and an Extreme Gradient Boosting meta-classifier. The framework uses Recursive Feature Elimination for feature selection and GridSearchCV for tuning the best hyperparameters. Without augmentation of the data, the RMSE equals 0.38, MAE equals 0.29, coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>) equals 0.87, and Pearson correlation equals 0.94. Data augmentation further improved model performance: RMSE = 0.06, MAE = 0.024, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.99, and a Pearson correlation of 0.99. In addition, this indicates that the data-augmented model outperforms all existing models with prominence in its strength and prediction capacity. Thus, the present framework provides a superior tool for computer-aided molecular design of safer and more effective ionic liquids.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 1","pages":"Article 100087"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced prediction of ionic liquid toxicity using a meta-ensemble learning framework with data augmentation\",\"authors\":\"Safa Sadaghiyanfam , Hiqmet Kamberaj , Yalcin Isler\",\"doi\":\"10.1016/j.aichem.2025.100087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Ionic liquids are unique in their properties and potential to be green solvents. Still, the toxicity concern remains, compelling the need for excellent predictive models for safe design and application. This work reports the introduction of a general, robust meta-ensemble learning framework for predicting the toxicity of ionic liquids using molecular descriptors and fingerprints. The proposed model incorporates the Random Forest, Support Vector Regression, Categorical Boosting, Chemical Convolutional Neural Network as a base classifier and an Extreme Gradient Boosting meta-classifier. The framework uses Recursive Feature Elimination for feature selection and GridSearchCV for tuning the best hyperparameters. Without augmentation of the data, the RMSE equals 0.38, MAE equals 0.29, coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>) equals 0.87, and Pearson correlation equals 0.94. Data augmentation further improved model performance: RMSE = 0.06, MAE = 0.024, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.99, and a Pearson correlation of 0.99. In addition, this indicates that the data-augmented model outperforms all existing models with prominence in its strength and prediction capacity. Thus, the present framework provides a superior tool for computer-aided molecular design of safer and more effective ionic liquids.</div></div>\",\"PeriodicalId\":72302,\"journal\":{\"name\":\"Artificial intelligence chemistry\",\"volume\":\"3 1\",\"pages\":\"Article 100087\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949747725000041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747725000041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
离子液体具有独特的性质和成为绿色溶剂的潜力。尽管如此,毒性问题仍然存在,迫切需要为安全设计和应用提供优秀的预测模型。这项工作报告了一个通用的、健壮的元集成学习框架的引入,用于使用分子描述符和指纹来预测离子液体的毒性。该模型结合了随机森林、支持向量回归、分类增强、化学卷积神经网络作为基本分类器和极端梯度增强元分类器。该框架使用递归特征消去进行特征选择,使用GridSearchCV优化最佳超参数。在不加值的情况下,RMSE = 0.38, MAE = 0.29,决定系数(R2) = 0.87, Pearson相关= 0.94。数据扩充进一步提高了模型性能:RMSE = 0.06, MAE = 0.024, R2 = 0.99, Pearson相关系数为0.99。此外,这表明数据增强模型在强度和预测能力方面优于所有现有模型。因此,本框架为更安全、更有效的离子液体的计算机辅助分子设计提供了一个优越的工具。
Enhanced prediction of ionic liquid toxicity using a meta-ensemble learning framework with data augmentation
Ionic liquids are unique in their properties and potential to be green solvents. Still, the toxicity concern remains, compelling the need for excellent predictive models for safe design and application. This work reports the introduction of a general, robust meta-ensemble learning framework for predicting the toxicity of ionic liquids using molecular descriptors and fingerprints. The proposed model incorporates the Random Forest, Support Vector Regression, Categorical Boosting, Chemical Convolutional Neural Network as a base classifier and an Extreme Gradient Boosting meta-classifier. The framework uses Recursive Feature Elimination for feature selection and GridSearchCV for tuning the best hyperparameters. Without augmentation of the data, the RMSE equals 0.38, MAE equals 0.29, coefficient of determination () equals 0.87, and Pearson correlation equals 0.94. Data augmentation further improved model performance: RMSE = 0.06, MAE = 0.024, = 0.99, and a Pearson correlation of 0.99. In addition, this indicates that the data-augmented model outperforms all existing models with prominence in its strength and prediction capacity. Thus, the present framework provides a superior tool for computer-aided molecular design of safer and more effective ionic liquids.