Analysis of uncertainty of neural fingerprint-based models†

IF 3.4 3区化学 Q2 Chemistry

Faraday Discussions Pub Date : 2024-09-25 DOI:10.1039/D4FD00095A

Christian W. Feldmann, Jochen Sieg and Miriam Mathea

{"title":"Analysis of uncertainty of neural fingerprint-based models†","authors":"Christian W. Feldmann, Jochen Sieg and Miriam Mathea","doi":"10.1039/D4FD00095A","DOIUrl":null,"url":null,"abstract":"<p >Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"256 ","pages":" 551-567"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Faraday Discussions","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/fd/d4fd00095a","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Chemistry","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning has gained popularity for predicting molecular properties based on molecular structure. This study explores the uncertainty estimates of neural fingerprint-based models by comparing pure graph neural networks (GNN) to classical machine learning algorithms combined with neural fingerprints. We investigate the advantage of extracting the neural fingerprint from the GNN and integrating it into a method known for producing better-calibrated probability estimates. Comparisons are made using three classical machine learning methods and the Chemprop model, considering different molecular representations and calibration techniques. We utilize 19 datasets from Toxcast, reflecting real-world scenarios with balanced accuracies ranging from 0.6 to 0.8. Results demonstrate that neural fingerprints combined with classical machine learning methods exhibit a slight decrease in prediction performance compared to the native Chemprop model. However, these models provide significantly improved uncertainty estimates. Notably, uncertainty estimates of neural fingerprint-based methods remain relatively robust for molecules dissimilar to the training set. This suggests that methods like random forest with neural fingerprints can deliver strong prediction performance and reliable uncertainty estimates. When considering both performance and uncertainty, the calibrated Chemprop model and the combination of neural fingerprints with random forest or support vector classifier (SVC) yield comparable results. Surprisingly, the SVC method shows promising performance when combined with neural or count fingerprints. These findings are particularly relevant in real-world industrial projects where accurate predictions and reliable uncertainty estimates are crucial.

Abstract Image

查看原文本刊更多论文

基于神经指纹模型的不确定性分析。

机器学习在基于分子结构预测分子特性方面越来越受欢迎。本研究通过比较纯图神经网络（GNN）与结合神经指纹的经典机器学习算法，探讨了基于神经指纹的模型的不确定性估计。我们研究了从 GNN 中提取神经指纹并将其整合到一种已知能产生更好校准概率估计值的方法中的优势。我们使用三种经典机器学习方法和 Chemprop 模型进行了比较，并考虑了不同的分子表征和校准技术。我们利用了来自 Toxcast 的 19 个数据集，这些数据集反映了现实世界中的各种情况，其平衡精度在 0.6 到 0.8 之间。结果表明，与原生 Chemprop 模型相比，神经指纹结合经典机器学习方法的预测性能略有下降。不过，这些模型提供的不确定性估计值有了明显改善。值得注意的是，对于与训练集不同的分子，基于神经指纹方法的不确定性估计仍然相对稳健。这表明，采用神经指纹的随机森林等方法可以提供强大的预测性能和可靠的不确定性估计。在同时考虑性能和不确定性时，经过校准的 Chemprop 模型和神经指纹与随机森林或支持向量分类器（SVC）的组合产生了不相上下的结果。令人惊讶的是，SVC 方法在与神经或计数指纹相结合时表现出了良好的性能。这些发现与现实世界中的工业项目尤其相关，因为在这些项目中，准确的预测和可靠的不确定性估计至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊