Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILES†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Benedikt Winter, Philipp Rehner, Timm Esper, Johannes Schilling and André Bardow
{"title":"Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILES†","authors":"Benedikt Winter, Philipp Rehner, Timm Esper, Johannes Schilling and André Bardow","doi":"10.1039/D4DD00077C","DOIUrl":null,"url":null,"abstract":"<p >A major bottleneck in developing sustainable processes and materials is a lack of property data. Recently, machine learning approaches have vastly improved previous methods for predicting molecular properties. However, these machine learning models are often not able to handle thermodynamic constraints adequately. In this work, we present a machine learning model based on natural language processing to predict pure-component parameters for the perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state. The model is based on our previously proposed SMILES-to-Properties-Transformer (SPT). By incorporating PC-SAFT into the neural network architecture, the machine learning model is trained directly on experimental vapor pressure and liquid density data. Combining established physical modeling approaches with state-of-the-art machine learning methods enables high-accuracy predictions across a wide range of pressures and temperatures, while keeping the thermodynamic consistency of an equation of state like PC-SAFT. SPT<small><sub>PC-SAFT</sub></small> demonstrates exceptional prediction accuracy even for complex molecules with various functional groups, outperforming traditional group contribution methods by a factor of four in the mean average percentage deviation. Moreover, SPT<small><sub>PC-SAFT</sub></small> captures the behavior of stereoisomers without any special consideration. To facilitate the application of our model, we provide predicted PC-SAFT parameters of 13 279 components, making PC-SAFT accessible to all researchers.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 5","pages":" 1142-1157"},"PeriodicalIF":6.2000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00077c?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00077c","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

A major bottleneck in developing sustainable processes and materials is a lack of property data. Recently, machine learning approaches have vastly improved previous methods for predicting molecular properties. However, these machine learning models are often not able to handle thermodynamic constraints adequately. In this work, we present a machine learning model based on natural language processing to predict pure-component parameters for the perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state. The model is based on our previously proposed SMILES-to-Properties-Transformer (SPT). By incorporating PC-SAFT into the neural network architecture, the machine learning model is trained directly on experimental vapor pressure and liquid density data. Combining established physical modeling approaches with state-of-the-art machine learning methods enables high-accuracy predictions across a wide range of pressures and temperatures, while keeping the thermodynamic consistency of an equation of state like PC-SAFT. SPTPC-SAFT demonstrates exceptional prediction accuracy even for complex molecules with various functional groups, outperforming traditional group contribution methods by a factor of four in the mean average percentage deviation. Moreover, SPTPC-SAFT captures the behavior of stereoisomers without any special consideration. To facilitate the application of our model, we provide predicted PC-SAFT parameters of 13 279 components, making PC-SAFT accessible to all researchers.

Abstract Image

理解分子的语言:从SMILES†预测PC-SAFT状态方程的纯组分参数
开发可持续工艺和材料的一个主要瓶颈是缺乏属性数据。最近,机器学习方法极大地改进了以前预测分子性质的方法。然而,这些机器学习模型往往不能充分处理热力学约束。在这项工作中,我们提出了一个基于自然语言处理的机器学习模型,用于预测摄动链统计关联流体理论(PC-SAFT)状态方程的纯组分参数。该模型基于我们之前提出的SMILES-to-Properties-Transformer (SPT)。通过将PC-SAFT整合到神经网络架构中,机器学习模型直接根据实验蒸汽压和液体密度数据进行训练。将已建立的物理建模方法与最先进的机器学习方法相结合,可以在广泛的压力和温度范围内实现高精度预测,同时保持PC-SAFT等状态方程的热力学一致性。SPTPC-SAFT即使对具有各种官能团的复杂分子也具有出色的预测精度,其平均平均百分比偏差比传统的基团贡献方法高出4倍。此外,SPTPC-SAFT无需任何特殊考虑即可捕获立体异构体的行为。为了便于我们的模型的应用,我们提供了13279个组件的预测PC-SAFT参数,使所有研究人员都可以访问PC-SAFT。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信