Evaluating the density of organic compounds at variable temperatures by a norm descriptor-based QSPR model†

IF 3.2 3区 工程技术 Q2 CHEMISTRY, PHYSICAL
Yueji Wang, Yu Gu, Qiaoyan Shang, Qingzhu Jia, Qiang Wang, Yin-Ning Zhou and Fangyou Yan
{"title":"Evaluating the density of organic compounds at variable temperatures by a norm descriptor-based QSPR model†","authors":"Yueji Wang, Yu Gu, Qiaoyan Shang, Qingzhu Jia, Qiang Wang, Yin-Ning Zhou and Fangyou Yan","doi":"10.1039/D5ME00035A","DOIUrl":null,"url":null,"abstract":"<p >Accurately predicting the density of organic compounds is essential in chemical engineering. This study develops a robust quantitative structure–property relationship (QSPR) model using a multiple linear regression (MLR) methodology, based on a comprehensive dataset of 5478 organic compounds and 23 866 data points to predict density over a broad temperature range (115.0 to 594.1 K). Notably, norm indices (NIs) are applied for QSPR modeling of organic compound density for the first time. The model demonstrates excellent predictive performance, with a squared correlation coefficient (<em>R</em><small><sup>2</sup></small>) of 0.9953 and a mean absolute error (MAE) of 10.11 kg m<small><sup>−3</sup></small>. Rigorous internal, external, and extrapolation validations are applied to confirm the model's reliability, accuracy, and generalization. The model achieves an <em>R</em><small><sup>2</sup></small> value of 0.9951 and a MAE of 9.31 kg m<small><sup>−3</sup></small> in external validation, while in internal validation using leave-one-out cross-validation, the corresponding values are 0.9951 and 10.51 kg m<small><sup>−3</sup></small>, respectively. Extrapolation validation, a novel approach recently introduced, further confirms the model's extrapolation ability, with most descriptors achieving the root mean square error (RMSE) of the test set (EV) values well below the training set's standard deviation (<em>σ</em><small><sub>95</sub></small> = 140.89 kg m<small><sup>−3</sup></small>), closely aligning with RMSE<small><sub>test</sub></small> (model). The RMSE of forward test exhibits a significant increase for NI<small><sub>8</sub></small> and NI<small><sub>27</sub></small> when the extrapolation degree (ED) exceeds 0.02, which suggests that it is not recommended to apply these two NIs for extrapolation. Overall, the results validate the robustness and broad applicability of the <em>ρ</em>(NI,<em>T</em>)-QSPR model, confirming its reliability for organic compound density prediction in industrial applications.</p>","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":" 9","pages":" 776-789"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/me/d5me00035a","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately predicting the density of organic compounds is essential in chemical engineering. This study develops a robust quantitative structure–property relationship (QSPR) model using a multiple linear regression (MLR) methodology, based on a comprehensive dataset of 5478 organic compounds and 23 866 data points to predict density over a broad temperature range (115.0 to 594.1 K). Notably, norm indices (NIs) are applied for QSPR modeling of organic compound density for the first time. The model demonstrates excellent predictive performance, with a squared correlation coefficient (R2) of 0.9953 and a mean absolute error (MAE) of 10.11 kg m−3. Rigorous internal, external, and extrapolation validations are applied to confirm the model's reliability, accuracy, and generalization. The model achieves an R2 value of 0.9951 and a MAE of 9.31 kg m−3 in external validation, while in internal validation using leave-one-out cross-validation, the corresponding values are 0.9951 and 10.51 kg m−3, respectively. Extrapolation validation, a novel approach recently introduced, further confirms the model's extrapolation ability, with most descriptors achieving the root mean square error (RMSE) of the test set (EV) values well below the training set's standard deviation (σ95 = 140.89 kg m−3), closely aligning with RMSEtest (model). The RMSE of forward test exhibits a significant increase for NI8 and NI27 when the extrapolation degree (ED) exceeds 0.02, which suggests that it is not recommended to apply these two NIs for extrapolation. Overall, the results validate the robustness and broad applicability of the ρ(NI,T)-QSPR model, confirming its reliability for organic compound density prediction in industrial applications.

Abstract Image

用基于范数描述符的QSPR模型评价变温度下有机化合物的密度
准确地预测有机化合物的密度在化学工程中是必不可少的。本研究基于5478种有机化合物的综合数据集和23866个数据点,利用多元线性回归(MLR)方法建立了稳健的定量结构-性质关系(QSPR)模型,用于预测较宽温度范围(115.0 ~ 594.1 K)下的密度。值得注意的是,范数指数(NIs)首次应用于有机化合物密度的QSPR模型。该模型具有良好的预测性能,平方相关系数(R2)为0.9953,平均绝对误差(MAE)为10.11 kg m−3。严格的内部、外部和外推验证应用于确认模型的可靠性、准确性和泛化。模型外部验证的R2值为0.9951,MAE为9.31 kg m−3,内部验证的留一交叉验证的R2值为0.9951,MAE为10.51 kg m−3。外推验证是最近引入的一种新方法,进一步证实了模型的外推能力,大多数描述符的测试集(EV)值的均方根误差(RMSE)远低于训练集的标准差(σ95 = 140.89 kg m - 3),与RMSEtest (model)密切一致。当外推度(ED)超过0.02时,NI8和NI27的正向检验RMSE显著增加,提示不建议采用这两个NIs进行外推。总体而言,结果验证了ρ(NI,T)-QSPR模型的稳健性和广泛适用性,证实了其在工业应用中有机化合物密度预测的可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Systems Design & Engineering
Molecular Systems Design & Engineering Engineering-Biomedical Engineering
CiteScore
6.40
自引率
2.80%
发文量
144
期刊介绍: Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信