Yueji Wang, Yu Gu, Qiaoyan Shang, Qingzhu Jia, Qiang Wang, Yin-Ning Zhou and Fangyou Yan
{"title":"用基于范数描述符的QSPR模型评价变温度下有机化合物的密度","authors":"Yueji Wang, Yu Gu, Qiaoyan Shang, Qingzhu Jia, Qiang Wang, Yin-Ning Zhou and Fangyou Yan","doi":"10.1039/D5ME00035A","DOIUrl":null,"url":null,"abstract":"<p >Accurately predicting the density of organic compounds is essential in chemical engineering. This study develops a robust quantitative structure–property relationship (QSPR) model using a multiple linear regression (MLR) methodology, based on a comprehensive dataset of 5478 organic compounds and 23 866 data points to predict density over a broad temperature range (115.0 to 594.1 K). Notably, norm indices (NIs) are applied for QSPR modeling of organic compound density for the first time. The model demonstrates excellent predictive performance, with a squared correlation coefficient (<em>R</em><small><sup>2</sup></small>) of 0.9953 and a mean absolute error (MAE) of 10.11 kg m<small><sup>−3</sup></small>. Rigorous internal, external, and extrapolation validations are applied to confirm the model's reliability, accuracy, and generalization. The model achieves an <em>R</em><small><sup>2</sup></small> value of 0.9951 and a MAE of 9.31 kg m<small><sup>−3</sup></small> in external validation, while in internal validation using leave-one-out cross-validation, the corresponding values are 0.9951 and 10.51 kg m<small><sup>−3</sup></small>, respectively. Extrapolation validation, a novel approach recently introduced, further confirms the model's extrapolation ability, with most descriptors achieving the root mean square error (RMSE) of the test set (EV) values well below the training set's standard deviation (<em>σ</em><small><sub>95</sub></small> = 140.89 kg m<small><sup>−3</sup></small>), closely aligning with RMSE<small><sub>test</sub></small> (model). The RMSE of forward test exhibits a significant increase for NI<small><sub>8</sub></small> and NI<small><sub>27</sub></small> when the extrapolation degree (ED) exceeds 0.02, which suggests that it is not recommended to apply these two NIs for extrapolation. Overall, the results validate the robustness and broad applicability of the <em>ρ</em>(NI,<em>T</em>)-QSPR model, confirming its reliability for organic compound density prediction in industrial applications.</p>","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":" 9","pages":" 776-789"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the density of organic compounds at variable temperatures by a norm descriptor-based QSPR model†\",\"authors\":\"Yueji Wang, Yu Gu, Qiaoyan Shang, Qingzhu Jia, Qiang Wang, Yin-Ning Zhou and Fangyou Yan\",\"doi\":\"10.1039/D5ME00035A\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Accurately predicting the density of organic compounds is essential in chemical engineering. This study develops a robust quantitative structure–property relationship (QSPR) model using a multiple linear regression (MLR) methodology, based on a comprehensive dataset of 5478 organic compounds and 23 866 data points to predict density over a broad temperature range (115.0 to 594.1 K). Notably, norm indices (NIs) are applied for QSPR modeling of organic compound density for the first time. The model demonstrates excellent predictive performance, with a squared correlation coefficient (<em>R</em><small><sup>2</sup></small>) of 0.9953 and a mean absolute error (MAE) of 10.11 kg m<small><sup>−3</sup></small>. Rigorous internal, external, and extrapolation validations are applied to confirm the model's reliability, accuracy, and generalization. The model achieves an <em>R</em><small><sup>2</sup></small> value of 0.9951 and a MAE of 9.31 kg m<small><sup>−3</sup></small> in external validation, while in internal validation using leave-one-out cross-validation, the corresponding values are 0.9951 and 10.51 kg m<small><sup>−3</sup></small>, respectively. Extrapolation validation, a novel approach recently introduced, further confirms the model's extrapolation ability, with most descriptors achieving the root mean square error (RMSE) of the test set (EV) values well below the training set's standard deviation (<em>σ</em><small><sub>95</sub></small> = 140.89 kg m<small><sup>−3</sup></small>), closely aligning with RMSE<small><sub>test</sub></small> (model). The RMSE of forward test exhibits a significant increase for NI<small><sub>8</sub></small> and NI<small><sub>27</sub></small> when the extrapolation degree (ED) exceeds 0.02, which suggests that it is not recommended to apply these two NIs for extrapolation. Overall, the results validate the robustness and broad applicability of the <em>ρ</em>(NI,<em>T</em>)-QSPR model, confirming its reliability for organic compound density prediction in industrial applications.</p>\",\"PeriodicalId\":91,\"journal\":{\"name\":\"Molecular Systems Design & Engineering\",\"volume\":\" 9\",\"pages\":\" 776-789\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Systems Design & Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/me/d5me00035a\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/me/d5me00035a","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
摘要
准确地预测有机化合物的密度在化学工程中是必不可少的。本研究基于5478种有机化合物的综合数据集和23866个数据点,利用多元线性回归(MLR)方法建立了稳健的定量结构-性质关系(QSPR)模型,用于预测较宽温度范围(115.0 ~ 594.1 K)下的密度。值得注意的是,范数指数(NIs)首次应用于有机化合物密度的QSPR模型。该模型具有良好的预测性能,平方相关系数(R2)为0.9953,平均绝对误差(MAE)为10.11 kg m−3。严格的内部、外部和外推验证应用于确认模型的可靠性、准确性和泛化。模型外部验证的R2值为0.9951,MAE为9.31 kg m−3,内部验证的留一交叉验证的R2值为0.9951,MAE为10.51 kg m−3。外推验证是最近引入的一种新方法,进一步证实了模型的外推能力,大多数描述符的测试集(EV)值的均方根误差(RMSE)远低于训练集的标准差(σ95 = 140.89 kg m - 3),与RMSEtest (model)密切一致。当外推度(ED)超过0.02时,NI8和NI27的正向检验RMSE显著增加,提示不建议采用这两个NIs进行外推。总体而言,结果验证了ρ(NI,T)-QSPR模型的稳健性和广泛适用性,证实了其在工业应用中有机化合物密度预测的可靠性。
Evaluating the density of organic compounds at variable temperatures by a norm descriptor-based QSPR model†
Accurately predicting the density of organic compounds is essential in chemical engineering. This study develops a robust quantitative structure–property relationship (QSPR) model using a multiple linear regression (MLR) methodology, based on a comprehensive dataset of 5478 organic compounds and 23 866 data points to predict density over a broad temperature range (115.0 to 594.1 K). Notably, norm indices (NIs) are applied for QSPR modeling of organic compound density for the first time. The model demonstrates excellent predictive performance, with a squared correlation coefficient (R2) of 0.9953 and a mean absolute error (MAE) of 10.11 kg m−3. Rigorous internal, external, and extrapolation validations are applied to confirm the model's reliability, accuracy, and generalization. The model achieves an R2 value of 0.9951 and a MAE of 9.31 kg m−3 in external validation, while in internal validation using leave-one-out cross-validation, the corresponding values are 0.9951 and 10.51 kg m−3, respectively. Extrapolation validation, a novel approach recently introduced, further confirms the model's extrapolation ability, with most descriptors achieving the root mean square error (RMSE) of the test set (EV) values well below the training set's standard deviation (σ95 = 140.89 kg m−3), closely aligning with RMSEtest (model). The RMSE of forward test exhibits a significant increase for NI8 and NI27 when the extrapolation degree (ED) exceeds 0.02, which suggests that it is not recommended to apply these two NIs for extrapolation. Overall, the results validate the robustness and broad applicability of the ρ(NI,T)-QSPR model, confirming its reliability for organic compound density prediction in industrial applications.
期刊介绍:
Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.