{"title":"在QSPR中使用机器学习来估计纯有机化合物的沸点和临界温度","authors":"Yassine Beghour, Yasmina Lahiouel","doi":"10.1016/j.ces.2025.121228","DOIUrl":null,"url":null,"abstract":"<div><div>Estimating physical and chemical properties, such as boiling temperature (Tb) and critical temperature (Tc), for organic compounds remains a significant challenge in chemical engineering. Accurate prediction of these properties has been a major research focus due to their importance in various applications. This study aims to develop models using a Quantitative Structure-Property Relationship (QSPR) approach to predict Tb and Tc for 417 and 412 organic compounds, respectively. The models rely on a machine learning algorithm, the multi-layer perceptron artificial neural network (MLP-ANN), for nonlinear modeling based on relevant molecular descriptors as input variables. A comparison with support vector regression (SVR) was conducted to assess the effectiveness of MLP-ANN. The optimal configurations for the MLP-ANN models were (25-17-1) for Tb and (25-14-1) for Tc. Various statistical metrics, R<sup>2</sup>, IOA, MAE, MAPE, and RMSE, were used to measure model accuracy and stability. For the MLP-ANN Tb model, results included R<sup>2</sup> = 0.9974, IOA = 0.9992, MAE = 3.6331, MAPE = 1.0165, and RMSE = 4.9321. For the Tc model, results were R<sup>2</sup> = 0.9935, IOA = 0.9982, MAE = 7.0545, MAPE = 1.0436, and RMSE = 9.5482. The MLP-ANN models consistently outperformed the SVR models, demonstrating superior accuracy, stability, and generalization. Additionally, the applicability domain (AD) analysis confirmed the reliability and generalizability of the models, with most data points falling within an acceptable range. A comparison with previous models showed that the proposed models surpass them in precision and robustness, highlighting the strong capability of models MLP-ANN to provide accurate predictions.</div></div>","PeriodicalId":271,"journal":{"name":"Chemical Engineering Science","volume":"309 ","pages":"Article 121228"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using machine learning in QSPR to estimate the boiling and critical temperatures of pure organic compounds\",\"authors\":\"Yassine Beghour, Yasmina Lahiouel\",\"doi\":\"10.1016/j.ces.2025.121228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Estimating physical and chemical properties, such as boiling temperature (Tb) and critical temperature (Tc), for organic compounds remains a significant challenge in chemical engineering. Accurate prediction of these properties has been a major research focus due to their importance in various applications. This study aims to develop models using a Quantitative Structure-Property Relationship (QSPR) approach to predict Tb and Tc for 417 and 412 organic compounds, respectively. The models rely on a machine learning algorithm, the multi-layer perceptron artificial neural network (MLP-ANN), for nonlinear modeling based on relevant molecular descriptors as input variables. A comparison with support vector regression (SVR) was conducted to assess the effectiveness of MLP-ANN. The optimal configurations for the MLP-ANN models were (25-17-1) for Tb and (25-14-1) for Tc. Various statistical metrics, R<sup>2</sup>, IOA, MAE, MAPE, and RMSE, were used to measure model accuracy and stability. For the MLP-ANN Tb model, results included R<sup>2</sup> = 0.9974, IOA = 0.9992, MAE = 3.6331, MAPE = 1.0165, and RMSE = 4.9321. For the Tc model, results were R<sup>2</sup> = 0.9935, IOA = 0.9982, MAE = 7.0545, MAPE = 1.0436, and RMSE = 9.5482. The MLP-ANN models consistently outperformed the SVR models, demonstrating superior accuracy, stability, and generalization. Additionally, the applicability domain (AD) analysis confirmed the reliability and generalizability of the models, with most data points falling within an acceptable range. A comparison with previous models showed that the proposed models surpass them in precision and robustness, highlighting the strong capability of models MLP-ANN to provide accurate predictions.</div></div>\",\"PeriodicalId\":271,\"journal\":{\"name\":\"Chemical Engineering Science\",\"volume\":\"309 \",\"pages\":\"Article 121228\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-01-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Engineering Science\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S000925092500051X\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Engineering Science","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S000925092500051X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
Using machine learning in QSPR to estimate the boiling and critical temperatures of pure organic compounds
Estimating physical and chemical properties, such as boiling temperature (Tb) and critical temperature (Tc), for organic compounds remains a significant challenge in chemical engineering. Accurate prediction of these properties has been a major research focus due to their importance in various applications. This study aims to develop models using a Quantitative Structure-Property Relationship (QSPR) approach to predict Tb and Tc for 417 and 412 organic compounds, respectively. The models rely on a machine learning algorithm, the multi-layer perceptron artificial neural network (MLP-ANN), for nonlinear modeling based on relevant molecular descriptors as input variables. A comparison with support vector regression (SVR) was conducted to assess the effectiveness of MLP-ANN. The optimal configurations for the MLP-ANN models were (25-17-1) for Tb and (25-14-1) for Tc. Various statistical metrics, R2, IOA, MAE, MAPE, and RMSE, were used to measure model accuracy and stability. For the MLP-ANN Tb model, results included R2 = 0.9974, IOA = 0.9992, MAE = 3.6331, MAPE = 1.0165, and RMSE = 4.9321. For the Tc model, results were R2 = 0.9935, IOA = 0.9982, MAE = 7.0545, MAPE = 1.0436, and RMSE = 9.5482. The MLP-ANN models consistently outperformed the SVR models, demonstrating superior accuracy, stability, and generalization. Additionally, the applicability domain (AD) analysis confirmed the reliability and generalizability of the models, with most data points falling within an acceptable range. A comparison with previous models showed that the proposed models surpass them in precision and robustness, highlighting the strong capability of models MLP-ANN to provide accurate predictions.
期刊介绍:
Chemical engineering enables the transformation of natural resources and energy into useful products for society. It draws on and applies natural sciences, mathematics and economics, and has developed fundamental engineering science that underpins the discipline.
Chemical Engineering Science (CES) has been publishing papers on the fundamentals of chemical engineering since 1951. CES is the platform where the most significant advances in the discipline have ever since been published. Chemical Engineering Science has accompanied and sustained chemical engineering through its development into the vibrant and broad scientific discipline it is today.