Analysis of uncertainty of neural fingerprint-based models.

IF 3.3 3区 化学 Q2 CHEMISTRY, PHYSICAL
Christian W Feldmann, Jochen Sieg, Miriam Mathea
{"title":"Analysis of uncertainty of neural fingerprint-based models.","authors":"Christian W Feldmann, Jochen Sieg, Miriam Mathea","doi":"10.1039/d4fd00095a","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.</p>","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Faraday Discussions","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d4fd00095a","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.

基于神经指纹模型的不确定性分析。
机器学习在基于分子结构预测分子特性方面越来越受欢迎。本研究通过比较纯图神经网络(GNN)与结合神经指纹的经典机器学习算法,探讨了基于神经指纹的模型的不确定性估计。我们研究了从 GNN 中提取神经指纹并将其整合到一种已知能产生更好校准概率估计值的方法中的优势。我们使用三种经典机器学习方法和 Chemprop 模型进行了比较,并考虑了不同的分子表征和校准技术。我们利用了来自 Toxcast 的 19 个数据集,这些数据集反映了现实世界中的各种情况,其平衡精度在 0.6 到 0.8 之间。结果表明,与原生 Chemprop 模型相比,神经指纹结合经典机器学习方法的预测性能略有下降。不过,这些模型提供的不确定性估计值有了明显改善。值得注意的是,对于与训练集不同的分子,基于神经指纹方法的不确定性估计仍然相对稳健。这表明,采用神经指纹的随机森林等方法可以提供强大的预测性能和可靠的不确定性估计。在同时考虑性能和不确定性时,经过校准的 Chemprop 模型和神经指纹与随机森林或支持向量分类器(SVC)的组合产生了不相上下的结果。令人惊讶的是,SVC 方法在与神经或计数指纹相结合时表现出了良好的性能。这些发现与现实世界中的工业项目尤其相关,因为在这些项目中,准确的预测和可靠的不确定性估计至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Faraday Discussions
Faraday Discussions 化学-物理化学
自引率
0.00%
发文量
259
期刊介绍: Discussion summary and research papers from discussion meetings that focus on rapidly developing areas of physical chemistry and its interfaces
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信