Improving Vapor Pressure Prediction Through Integration of Multiple Molecular Representations: A Super Learner Approach

IF 2.3 4区 化学 Q1 SOCIAL WORK
Ji Hyun Nam, Seul Lee, Seongil Jo, Jaeoh Kim, Jooyeon Lee, Jahyun Koo, Byounghwak Lee, Keunhong Jeong, Donghyeon Yu
{"title":"Improving Vapor Pressure Prediction Through Integration of Multiple Molecular Representations: A Super Learner Approach","authors":"Ji Hyun Nam,&nbsp;Seul Lee,&nbsp;Seongil Jo,&nbsp;Jaeoh Kim,&nbsp;Jooyeon Lee,&nbsp;Jahyun Koo,&nbsp;Byounghwak Lee,&nbsp;Keunhong Jeong,&nbsp;Donghyeon Yu","doi":"10.1002/cem.70003","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Accurate prediction of vapor pressure is essential in chemical engineering, environmental science, and pharmaceutical development, impacting the volatility and stability of compounds. Traditional methods often fall short for complex and new molecular structures. This study introduces an advanced machine learning approach, integrating graph neural networks (GNNs), and CHEM-BERT models to improve prediction accuracy. Utilizing the largest dataset to date, we derived comprehensive chemical descriptors and fingerprints. We evaluated 19 predictive models, including ridge regression, random forest, support vector regression, and feed-forward neural networks, trained on diverse features like PaDEL and Morgan fingerprints, chemical descriptors, and Chem-BERT embeddings. Central to our methodology is the super learner architecture, which combines 19 multiple models to enhance accuracy. The super learner achieved a root mean squared error (RMSE) of 0.8200, outperforming individual models and previous reports. These successful results highlight the effectiveness of integrating GNNs and Chem-BERT for capturing detailed molecular information, setting a new benchmark for vapor pressure prediction. This study underscores the value of advanced machine learning techniques and comprehensive datasets, offering a robust tool for researchers and paving the way for future advancements in chemical property prediction.</p>\n </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 2","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70003","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate prediction of vapor pressure is essential in chemical engineering, environmental science, and pharmaceutical development, impacting the volatility and stability of compounds. Traditional methods often fall short for complex and new molecular structures. This study introduces an advanced machine learning approach, integrating graph neural networks (GNNs), and CHEM-BERT models to improve prediction accuracy. Utilizing the largest dataset to date, we derived comprehensive chemical descriptors and fingerprints. We evaluated 19 predictive models, including ridge regression, random forest, support vector regression, and feed-forward neural networks, trained on diverse features like PaDEL and Morgan fingerprints, chemical descriptors, and Chem-BERT embeddings. Central to our methodology is the super learner architecture, which combines 19 multiple models to enhance accuracy. The super learner achieved a root mean squared error (RMSE) of 0.8200, outperforming individual models and previous reports. These successful results highlight the effectiveness of integrating GNNs and Chem-BERT for capturing detailed molecular information, setting a new benchmark for vapor pressure prediction. This study underscores the value of advanced machine learning techniques and comprehensive datasets, offering a robust tool for researchers and paving the way for future advancements in chemical property prediction.

通过整合多个分子表征改进蒸汽压预测:一种超级学习器方法
蒸汽压的准确预测在化学工程、环境科学和药物开发中至关重要,它影响着化合物的挥发性和稳定性。传统的方法往往不能满足复杂和新的分子结构。本研究引入了一种先进的机器学习方法,将图神经网络(GNNs)和CHEM-BERT模型相结合,以提高预测精度。利用迄今为止最大的数据集,我们得到了全面的化学描述符和指纹。我们评估了19种预测模型,包括脊回归、随机森林、支持向量回归和前馈神经网络,并对不同的特征(如PaDEL和Morgan指纹、化学描述符和Chem-BERT嵌入)进行了训练。我们方法论的核心是超级学习者架构,它结合了19个多个模型来提高准确性。超级学习器的均方根误差(RMSE)为0.8200,优于单个模型和以前的报告。这些成功的结果突出了集成gnn和Chem-BERT捕获详细分子信息的有效性,为蒸汽压预测设定了新的基准。这项研究强调了先进的机器学习技术和综合数据集的价值,为研究人员提供了一个强大的工具,并为化学性质预测的未来发展铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信