用 GFlowNets 生成细胞形态学引导的小分子化合物

arXiv - QuanBio - Biomolecules Pub Date : 2024-08-09 DOI:arxiv-2408.05196

Stephen Zhewen Lu, Ziqing Lu, Ehsan Hajiramezanali, Tommaso Biancalani, Yoshua Bengio, Gabriele Scalia, Michał Koziarski

{"title":"用 GFlowNets 生成细胞形态学引导的小分子化合物","authors":"Stephen Zhewen Lu, Ziqing Lu, Ehsan Hajiramezanali, Tommaso Biancalani, Yoshua Bengio, Gabriele Scalia, Michał Koziarski","doi":"arxiv-2408.05196","DOIUrl":null,"url":null,"abstract":"High-content phenotypic screening, including high-content imaging (HCI), has\ngained popularity in the last few years for its ability to characterize novel\ntherapeutics without prior knowledge of the protein target. When combined with\ndeep learning techniques to predict and represent molecular-phenotype\ninteractions, these advancements hold the potential to significantly accelerate\nand enhance drug discovery applications. This work focuses on the novel task of\nHCI-guided molecular design. Generative models for molecule design could be\nguided by HCI data, for example with a supervised model that links molecules to\nphenotypes of interest as a reward function. However, limited labeled data,\ncombined with the high-dimensional readouts, can make training these methods\nchallenging and impractical. We consider an alternative approach in which we\nleverage an unsupervised multimodal joint embedding to define a latent\nsimilarity as a reward for GFlowNets. The proposed model learns to generate new\nmolecules that could produce phenotypic effects similar to those of the given\nimage target, without relying on pre-annotated phenotypic labels. We\ndemonstrate that the proposed method generates molecules with high\nmorphological and structural similarity to the target, increasing the\nlikelihood of similar biological activity, as confirmed by an independent\noracle model.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cell Morphology-Guided Small Molecule Generation with GFlowNets\",\"authors\":\"Stephen Zhewen Lu, Ziqing Lu, Ehsan Hajiramezanali, Tommaso Biancalani, Yoshua Bengio, Gabriele Scalia, Michał Koziarski\",\"doi\":\"arxiv-2408.05196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-content phenotypic screening, including high-content imaging (HCI), has\\ngained popularity in the last few years for its ability to characterize novel\\ntherapeutics without prior knowledge of the protein target. When combined with\\ndeep learning techniques to predict and represent molecular-phenotype\\ninteractions, these advancements hold the potential to significantly accelerate\\nand enhance drug discovery applications. This work focuses on the novel task of\\nHCI-guided molecular design. Generative models for molecule design could be\\nguided by HCI data, for example with a supervised model that links molecules to\\nphenotypes of interest as a reward function. However, limited labeled data,\\ncombined with the high-dimensional readouts, can make training these methods\\nchallenging and impractical. We consider an alternative approach in which we\\nleverage an unsupervised multimodal joint embedding to define a latent\\nsimilarity as a reward for GFlowNets. The proposed model learns to generate new\\nmolecules that could produce phenotypic effects similar to those of the given\\nimage target, without relying on pre-annotated phenotypic labels. We\\ndemonstrate that the proposed method generates molecules with high\\nmorphological and structural similarity to the target, increasing the\\nlikelihood of similar biological activity, as confirmed by an independent\\noracle model.\",\"PeriodicalId\":501022,\"journal\":{\"name\":\"arXiv - QuanBio - Biomolecules\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Biomolecules\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.05196\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

高内涵表型筛选，包括高内涵成像（HCI），因其能够在不预先了解蛋白质靶点的情况下表征新型治疗药物而在过去几年中越来越受欢迎。如果结合深度学习技术来预测和表征分子与表型之间的相互作用，这些进展将有可能大大加快和提高药物发现应用的速度。这项工作的重点是以HCI 为指导的分子设计这一新颖任务。分子设计的生成模型可以由HCI数据引导，例如使用监督模型将分子与感兴趣的表型联系起来作为奖励函数。然而，有限的标记数据加上高维读数，会使这些方法的训练变得困难和不切实际。我们考虑了另一种方法，即利用无监督多模态联合嵌入来定义潜在相似性，作为 GFlow 网络的奖励。我们提出的模型可以学习生成新分子，从而产生与给定图像目标相似的表型效应，而无需依赖预先标注的表型标签。我们证明，所提出的方法生成的分子在形态和结构上与靶标具有高度相似性，从而提高了类似生物活性的可能性，这一点已被一个独立的oracle模型所证实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cell Morphology-Guided Small Molecule Generation with GFlowNets

High-content phenotypic screening, including high-content imaging (HCI), has gained popularity in the last few years for its ability to characterize novel therapeutics without prior knowledge of the protein target. When combined with deep learning techniques to predict and represent molecular-phenotype interactions, these advancements hold the potential to significantly accelerate and enhance drug discovery applications. This work focuses on the novel task of HCI-guided molecular design. Generative models for molecule design could be guided by HCI data, for example with a supervised model that links molecules to phenotypes of interest as a reward function. However, limited labeled data, combined with the high-dimensional readouts, can make training these methods challenging and impractical. We consider an alternative approach in which we leverage an unsupervised multimodal joint embedding to define a latent similarity as a reward for GFlowNets. The proposed model learns to generate new molecules that could produce phenotypic effects similar to those of the given image target, without relying on pre-annotated phenotypic labels. We demonstrate that the proposed method generates molecules with high morphological and structural similarity to the target, increasing the likelihood of similar biological activity, as confirmed by an independent oracle model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - QuanBio - Biomolecules

自引率

0.00%

发文量