{"title":"释放潜在粘合剂:用于去噪 DNA 编码文库的多模式预训练 DEL-Fusion","authors":"Chunbin Gu, Mutian He, Hanqun Cao, Guangyong Chen, Chang-yu Hsieh, Pheng Ann Heng","doi":"arxiv-2409.05916","DOIUrl":null,"url":null,"abstract":"In the realm of drug discovery, DNA-encoded library (DEL) screening\ntechnology has emerged as an efficient method for identifying high-affinity\ncompounds. However, DEL screening faces a significant challenge: noise arising\nfrom nonspecific interactions within complex biological systems. Neural\nnetworks trained on DEL libraries have been employed to extract compound\nfeatures, aiming to denoise the data and uncover potential binders to the\ndesired therapeutic target. Nevertheless, the inherent structure of DEL,\nconstrained by the limited diversity of building blocks, impacts the\nperformance of compound encoders. Moreover, existing methods only capture\ncompound features at a single level, further limiting the effectiveness of the\ndenoising strategy. To mitigate these issues, we propose a Multimodal\nPretraining DEL-Fusion model (MPDF) that enhances encoder capabilities through\npretraining and integrates compound features across various scales. We develop\npretraining tasks applying contrastive objectives between different compound\nrepresentations and their text descriptions, enhancing the compound encoders'\nability to acquire generic features. Furthermore, we propose a novel DEL-fusion\nframework that amalgamates compound information at the atomic, submolecular,\nand molecular levels, as captured by various compound encoders. The synergy of\nthese innovations equips MPDF with enriched, multi-scale features, enabling\ncomprehensive downstream denoising. Evaluated on three DEL datasets, MPDF\ndemonstrates superior performance in data processing and analysis for\nvalidation tasks. Notably, MPDF offers novel insights into identifying\nhigh-affinity molecules, paving the way for improved DEL utility in drug\ndiscovery.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries\",\"authors\":\"Chunbin Gu, Mutian He, Hanqun Cao, Guangyong Chen, Chang-yu Hsieh, Pheng Ann Heng\",\"doi\":\"arxiv-2409.05916\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the realm of drug discovery, DNA-encoded library (DEL) screening\\ntechnology has emerged as an efficient method for identifying high-affinity\\ncompounds. However, DEL screening faces a significant challenge: noise arising\\nfrom nonspecific interactions within complex biological systems. Neural\\nnetworks trained on DEL libraries have been employed to extract compound\\nfeatures, aiming to denoise the data and uncover potential binders to the\\ndesired therapeutic target. Nevertheless, the inherent structure of DEL,\\nconstrained by the limited diversity of building blocks, impacts the\\nperformance of compound encoders. Moreover, existing methods only capture\\ncompound features at a single level, further limiting the effectiveness of the\\ndenoising strategy. To mitigate these issues, we propose a Multimodal\\nPretraining DEL-Fusion model (MPDF) that enhances encoder capabilities through\\npretraining and integrates compound features across various scales. We develop\\npretraining tasks applying contrastive objectives between different compound\\nrepresentations and their text descriptions, enhancing the compound encoders'\\nability to acquire generic features. Furthermore, we propose a novel DEL-fusion\\nframework that amalgamates compound information at the atomic, submolecular,\\nand molecular levels, as captured by various compound encoders. The synergy of\\nthese innovations equips MPDF with enriched, multi-scale features, enabling\\ncomprehensive downstream denoising. Evaluated on three DEL datasets, MPDF\\ndemonstrates superior performance in data processing and analysis for\\nvalidation tasks. Notably, MPDF offers novel insights into identifying\\nhigh-affinity molecules, paving the way for improved DEL utility in drug\\ndiscovery.\",\"PeriodicalId\":501266,\"journal\":{\"name\":\"arXiv - QuanBio - Quantitative Methods\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Quantitative Methods\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05916\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在药物发现领域,DNA编码文库(DEL)筛选技术已成为鉴定高亲和力化合物的有效方法。然而,DEL 筛选面临着一个重大挑战:复杂生物系统中的非特异性相互作用所产生的噪音。在 DEL 库上训练的神经网络已被用于提取化合物特征,目的是对数据进行去噪处理,并发现与所需治疗靶点的潜在结合体。然而,DEL 的固有结构受到构建模块多样性有限的限制,影响了化合物编码器的性能。此外,现有方法只能捕捉单层次的化合物特征,进一步限制了去噪策略的有效性。为了缓解这些问题,我们提出了一种多模态预训练 DEL-Fusion 模型(MPDF),通过预训练来增强编码器的能力,并整合不同尺度的复合特征。我们开发了在不同的复合表述及其文本描述之间应用对比目标的训练任务,从而增强了复合编码器获取通用特征的能力。此外,我们还提出了一种新颖的 DEL 融合框架,该框架可将各种化合物编码器捕捉到的原子、亚分子和分子层面的化合物信息融合在一起。这些创新的协同作用为 MPDF 提供了丰富的多尺度特征,从而实现了全面的下游去噪。在三个 DEL 数据集上进行的评估表明,MPDF 在数据处理和分析验证任务中表现出卓越的性能。值得注意的是,MPDF 为识别高亲和力分子提供了新的见解,为提高 DEL 在药物发现中的实用性铺平了道路。
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries
In the realm of drug discovery, DNA-encoded library (DEL) screening
technology has emerged as an efficient method for identifying high-affinity
compounds. However, DEL screening faces a significant challenge: noise arising
from nonspecific interactions within complex biological systems. Neural
networks trained on DEL libraries have been employed to extract compound
features, aiming to denoise the data and uncover potential binders to the
desired therapeutic target. Nevertheless, the inherent structure of DEL,
constrained by the limited diversity of building blocks, impacts the
performance of compound encoders. Moreover, existing methods only capture
compound features at a single level, further limiting the effectiveness of the
denoising strategy. To mitigate these issues, we propose a Multimodal
Pretraining DEL-Fusion model (MPDF) that enhances encoder capabilities through
pretraining and integrates compound features across various scales. We develop
pretraining tasks applying contrastive objectives between different compound
representations and their text descriptions, enhancing the compound encoders'
ability to acquire generic features. Furthermore, we propose a novel DEL-fusion
framework that amalgamates compound information at the atomic, submolecular,
and molecular levels, as captured by various compound encoders. The synergy of
these innovations equips MPDF with enriched, multi-scale features, enabling
comprehensive downstream denoising. Evaluated on three DEL datasets, MPDF
demonstrates superior performance in data processing and analysis for
validation tasks. Notably, MPDF offers novel insights into identifying
high-affinity molecules, paving the way for improved DEL utility in drug
discovery.