Michael Sun, Alston Lo, Wenhao Gao, Minghao Guo, Veronika Thost, Jie Chen, Connor Coley, Wojciech Matusik
{"title":"Syntax-Guided Procedural Synthesis of Molecules","authors":"Michael Sun, Alston Lo, Wenhao Gao, Minghao Guo, Veronika Thost, Jie Chen, Connor Coley, Wojciech Matusik","doi":"arxiv-2409.05873","DOIUrl":null,"url":null,"abstract":"Designing synthetically accessible molecules and recommending analogs to\nunsynthesizable molecules are important problems for accelerating molecular\ndiscovery. We reconceptualize both problems using ideas from program synthesis.\nDrawing inspiration from syntax-guided synthesis approaches, we decouple the\nsyntactic skeleton from the semantics of a synthetic tree to create a bilevel\nframework for reasoning about the combinatorial space of synthesis pathways.\nGiven a molecule we aim to generate analogs for, we iteratively refine its\nskeletal characteristics via Markov Chain Monte Carlo simulations over the\nspace of syntactic skeletons. Given a black-box oracle to optimize, we\nformulate a joint design space over syntactic templates and molecular\ndescriptors and introduce evolutionary algorithms that optimize both syntactic\nand semantic dimensions synergistically. Our key insight is that once the\nsyntactic skeleton is set, we can amortize over the search complexity of\nderiving the program's semantics by training policies to fully utilize the\nfixed horizon Markov Decision Process imposed by the syntactic template. We\ndemonstrate performance advantages of our bilevel framework for synthesizable\nanalog generation and synthesizable molecule design. Notably, our approach\noffers the user explicit control over the resources required to perform\nsynthesis and biases the design space towards simpler solutions, making it\nparticularly promising for autonomous synthesis platforms.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Designing synthetically accessible molecules and recommending analogs to
unsynthesizable molecules are important problems for accelerating molecular
discovery. We reconceptualize both problems using ideas from program synthesis.
Drawing inspiration from syntax-guided synthesis approaches, we decouple the
syntactic skeleton from the semantics of a synthetic tree to create a bilevel
framework for reasoning about the combinatorial space of synthesis pathways.
Given a molecule we aim to generate analogs for, we iteratively refine its
skeletal characteristics via Markov Chain Monte Carlo simulations over the
space of syntactic skeletons. Given a black-box oracle to optimize, we
formulate a joint design space over syntactic templates and molecular
descriptors and introduce evolutionary algorithms that optimize both syntactic
and semantic dimensions synergistically. Our key insight is that once the
syntactic skeleton is set, we can amortize over the search complexity of
deriving the program's semantics by training policies to fully utilize the
fixed horizon Markov Decision Process imposed by the syntactic template. We
demonstrate performance advantages of our bilevel framework for synthesizable
analog generation and synthesizable molecule design. Notably, our approach
offers the user explicit control over the resources required to perform
synthesis and biases the design space towards simpler solutions, making it
particularly promising for autonomous synthesis platforms.