Graph neural processes for molecules: an evaluation on docking scores and strategies to improve generalization

IF 7.1 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2024-10-23 DOI:10.1186/s13321-024-00904-2

Miguel García-Ortegón, Srijit Seal, Carl Rasmussen, Andreas Bender, Sergio Bacallado

{"title":"Graph neural processes for molecules: an evaluation on docking scores and strategies to improve generalization","authors":"Miguel García-Ortegón, Srijit Seal, Carl Rasmussen, Andreas Bender, Sergio Bacallado","doi":"10.1186/s13321-024-00904-2","DOIUrl":null,"url":null,"abstract":"<p>Neural processes (NPs) are models for meta-learning which output uncertainty estimates. So far, most studies of NPs have focused on low-dimensional datasets of highly-correlated tasks. While these homogeneous datasets are useful for benchmarking, they may not be representative of realistic transfer learning. In particular, applications in scientific research may prove especially challenging due to the potential novelty of meta-testing tasks. Molecular property prediction is one such research area that is characterized by sparse datasets of many functions on a shared molecular space. In this paper, we study the application of graph NPs to molecular property prediction with DOCKSTRING, a diverse dataset of docking scores. Graph NPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as alternative techniques for transfer learning and meta-learning. In order to increase meta-generalization to divergent test functions, we propose fine-tuning strategies that adapt the parameters of NPs. We find that adaptation can substantially increase NPs' regression performance while maintaining good calibration of uncertainty estimates. Finally, we present a Bayesian optimization experiment which showcases the potential advantages of NPs over Gaussian processes in iterative screening. Overall, our results suggest that NPs on molecular graphs hold great potential for molecular property prediction in the low-data setting.</p><p>Neural processes are a family of meta-learning algorithms which deal with data scarcity by transferring information across tasks and making probabilistic predictions. We evaluate their performance on regression and optimization molecular tasks using docking scores, finding them to outperform classical single-task and transfer-learning models. We examine the issue of generalization to divergent test tasks, which is a general concern of meta-learning algorithms in science, and propose strategies to alleviate it.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00904-2","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00904-2","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Neural processes (NPs) are models for meta-learning which output uncertainty estimates. So far, most studies of NPs have focused on low-dimensional datasets of highly-correlated tasks. While these homogeneous datasets are useful for benchmarking, they may not be representative of realistic transfer learning. In particular, applications in scientific research may prove especially challenging due to the potential novelty of meta-testing tasks. Molecular property prediction is one such research area that is characterized by sparse datasets of many functions on a shared molecular space. In this paper, we study the application of graph NPs to molecular property prediction with DOCKSTRING, a diverse dataset of docking scores. Graph NPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as alternative techniques for transfer learning and meta-learning. In order to increase meta-generalization to divergent test functions, we propose fine-tuning strategies that adapt the parameters of NPs. We find that adaptation can substantially increase NPs' regression performance while maintaining good calibration of uncertainty estimates. Finally, we present a Bayesian optimization experiment which showcases the potential advantages of NPs over Gaussian processes in iterative screening. Overall, our results suggest that NPs on molecular graphs hold great potential for molecular property prediction in the low-data setting.

Neural processes are a family of meta-learning algorithms which deal with data scarcity by transferring information across tasks and making probabilistic predictions. We evaluate their performance on regression and optimization molecular tasks using docking scores, finding them to outperform classical single-task and transfer-learning models. We examine the issue of generalization to divergent test tasks, which is a general concern of meta-learning algorithms in science, and propose strategies to alleviate it.

查看原文本刊更多论文

分子的图神经过程：对接得分评估和提高通用性的策略

神经过程（NP）是一种元学习模型，可输出不确定性估计值。迄今为止，大多数关于 NP 的研究都集中在高度相关任务的低维数据集上。虽然这些同质数据集有助于制定基准，但它们可能并不能代表现实的迁移学习。特别是，由于元测试任务的潜在新颖性，科学研究中的应用可能证明特别具有挑战性。分子性质预测就是这样一个研究领域，其特点是共享分子空间上许多函数的稀疏数据集。在本文中，我们利用 DOCKSTRING（一个多样化的对接得分数据集）研究了图 NP 在分子性质预测中的应用。与化学信息学中常见的监督学习基线以及迁移学习和元学习的替代技术相比，图 NPs 在少量学习任务中表现出了极具竞争力的性能。为了提高对不同测试函数的元泛化能力，我们提出了调整 NPs 参数的微调策略。我们发现，调整可以大幅提高 NPs 的回归性能，同时保持不确定性估计的良好校准。最后，我们介绍了一个贝叶斯优化实验，该实验展示了 NPs 在迭代筛选中相对于高斯过程的潜在优势。总之，我们的研究结果表明，分子图上的神经过程在低数据环境下的分子性质预测方面具有巨大潜力。神经过程是元学习算法的一个系列，它通过跨任务传递信息和进行概率预测来应对数据稀缺问题。我们利用对接得分评估了它们在回归和优化分子任务上的性能，发现它们优于经典的单一任务和迁移学习模型。我们研究了元学习算法在科学领域普遍关注的对不同测试任务的泛化问题，并提出了缓解这一问题的策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.