GFlowNet 预培训与廉价奖励

arXiv - QuanBio - Biomolecules Pub Date : 2024-09-15 DOI:arxiv-2409.09702

Mohit Pandey, Gopeshh Subbaraj, Emmanuel Bengio

{"title":"GFlowNet 预培训与廉价奖励","authors":"Mohit Pandey, Gopeshh Subbaraj, Emmanuel Bengio","doi":"arxiv-2409.09702","DOIUrl":null,"url":null,"abstract":"Generative Flow Networks (GFlowNets), a class of generative models have\nrecently emerged as a suitable framework for generating diverse and\nhigh-quality molecular structures by learning from unnormalized reward\ndistributions. Previous works in this direction often restrict exploration by\nusing predefined molecular fragments as building blocks, limiting the chemical\nspace that can be accessed. In this work, we introduce Atomic GFlowNets\n(A-GFNs), a foundational generative model leveraging individual atoms as\nbuilding blocks to explore drug-like chemical space more comprehensively. We\npropose an unsupervised pre-training approach using offline drug-like molecule\ndatasets, which conditions A-GFNs on inexpensive yet informative molecular\ndescriptors such as drug-likeliness, topological polar surface area, and\nsynthetic accessibility scores. These properties serve as proxy rewards,\nguiding A-GFNs towards regions of chemical space that exhibit desirable\npharmacological properties. We further our method by implementing a\ngoal-conditioned fine-tuning process, which adapts A-GFNs to optimize for\nspecific target properties. In this work, we pretrain A-GFN on the ZINC15\noffline dataset and employ robust evaluation metrics to show the effectiveness\nof our approach when compared to other relevant baseline methods in drug\ndesign.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"65 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GFlowNet Pretraining with Inexpensive Rewards\",\"authors\":\"Mohit Pandey, Gopeshh Subbaraj, Emmanuel Bengio\",\"doi\":\"arxiv-2409.09702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generative Flow Networks (GFlowNets), a class of generative models have\\nrecently emerged as a suitable framework for generating diverse and\\nhigh-quality molecular structures by learning from unnormalized reward\\ndistributions. Previous works in this direction often restrict exploration by\\nusing predefined molecular fragments as building blocks, limiting the chemical\\nspace that can be accessed. In this work, we introduce Atomic GFlowNets\\n(A-GFNs), a foundational generative model leveraging individual atoms as\\nbuilding blocks to explore drug-like chemical space more comprehensively. We\\npropose an unsupervised pre-training approach using offline drug-like molecule\\ndatasets, which conditions A-GFNs on inexpensive yet informative molecular\\ndescriptors such as drug-likeliness, topological polar surface area, and\\nsynthetic accessibility scores. These properties serve as proxy rewards,\\nguiding A-GFNs towards regions of chemical space that exhibit desirable\\npharmacological properties. We further our method by implementing a\\ngoal-conditioned fine-tuning process, which adapts A-GFNs to optimize for\\nspecific target properties. In this work, we pretrain A-GFN on the ZINC15\\noffline dataset and employ robust evaluation metrics to show the effectiveness\\nof our approach when compared to other relevant baseline methods in drug\\ndesign.\",\"PeriodicalId\":501022,\"journal\":{\"name\":\"arXiv - QuanBio - Biomolecules\",\"volume\":\"65 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Biomolecules\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

生成流网络（GFlowNets）是最近出现的一类生成模型，它是通过学习非规范化奖励分布生成多样化和高质量分子结构的合适框架。在这一方向上，以前的工作通常通过使用预定义的分子片段作为构建模块来限制探索，从而限制了可访问的化学空间。在这项工作中，我们引入了原子 GFlowNets（A-GFNs），这是一种利用单个原子作为构建模块的基础生成模型，可以更全面地探索类药物的化学空间。我们提出了一种使用离线类药物分子集进行无监督预训练的方法，该方法以廉价但信息丰富的分子描述符（如药物可能性、拓扑极性表面积和合成可及性得分）为 A-GFNs 的条件。这些特性可以作为替代奖励，引导 A-GFN 向化学空间中表现出理想药理特性的区域前进。我们还通过实施前置条件微调过程进一步完善了我们的方法，该过程可调整 A-GFN 以优化特定的目标特性。在这项工作中，我们在 ZINC15 离线数据集上对 A-GFN 进行了预训练，并采用了稳健的评估指标来显示我们的方法与药物设计中其他相关基线方法相比的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GFlowNet Pretraining with Inexpensive Rewards

Generative Flow Networks (GFlowNets), a class of generative models have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from unnormalized reward distributions. Previous works in this direction often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using offline drug-like molecule datasets, which conditions A-GFNs on inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further our method by implementing a goal-conditioned fine-tuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on the ZINC15 offline dataset and employ robust evaluation metrics to show the effectiveness of our approach when compared to other relevant baseline methods in drug design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - QuanBio - Biomolecules

自引率

0.00%

发文量