PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling.

ArXiv Pub Date : 2024-08-11
Tianlai Chen, Madeleine Dumas, Rio Watson, Sophia Vincoff, Christina Peng, Lin Zhao, Lauren Hong, Sarah Pertsemlidis, Mayumi Shaepers-Cheu, Tian Zi Wang, Divya Srijay, Connor Monticello, Pranay Vure, Rishab Pulugurta, Kseniia Kholina, Shrey Goel, Matthew P DeLisa, Ray Truant, Hector C Aguilar, Pranam Chatterjee
{"title":"PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling.","authors":"Tianlai Chen, Madeleine Dumas, Rio Watson, Sophia Vincoff, Christina Peng, Lin Zhao, Lauren Hong, Sarah Pertsemlidis, Mayumi Shaepers-Cheu, Tian Zi Wang, Divya Srijay, Connor Monticello, Pranay Vure, Rishab Pulugurta, Kseniia Kholina, Shrey Goel, Matthew P DeLisa, Ray Truant, Hector C Aguilar, Pranam Chatterjee","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Target proteins that lack accessible binding pockets and conformational stability have posed increasing challenges for drug development. Induced proximity strategies, such as PROTACs and molecular glues, have thus gained attention as pharmacological alternatives, but still require small molecule docking at binding pockets for targeted protein degradation. The computational design of protein-based binders presents unique opportunities to access \"undruggable\" targets, but have often relied on stable 3D structures or structure-influenced latent spaces for effective binder generation. In this work, we introduce <b>PepMLM</b>, a target sequence-conditioned generator of <i>de novo</i> linear peptide binders. By employing a novel span masking strategy that uniquely positions cognate peptide sequences at the C-terminus of target protein sequences, PepMLM fine-tunes the state-of-the-art ESM-2 pLM to fully reconstruct the binder region, achieving low perplexities matching or improving upon validated peptide-protein sequence pairs. After successful <i>in silico</i> benchmarking with AlphaFold-Multimer, outperforming RFDiffusion on structured targets, we experimentally verify PepMLM's efficacy via fusion of model-derived peptides to E3 ubiquitin ligase domains, demonstrating endogenous degradation of emergent viral phosphoproteins and Huntington's disease-driving proteins. In total, PepMLM enables the generative design of candidate binders to any target protein, without the requirement of target structure, empowering downstream therapeutic applications.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e9/d4/nihpp-2310.03842v1.PMC10593082.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Target proteins that lack accessible binding pockets and conformational stability have posed increasing challenges for drug development. Induced proximity strategies, such as PROTACs and molecular glues, have thus gained attention as pharmacological alternatives, but still require small molecule docking at binding pockets for targeted protein degradation. The computational design of protein-based binders presents unique opportunities to access "undruggable" targets, but have often relied on stable 3D structures or structure-influenced latent spaces for effective binder generation. In this work, we introduce PepMLM, a target sequence-conditioned generator of de novo linear peptide binders. By employing a novel span masking strategy that uniquely positions cognate peptide sequences at the C-terminus of target protein sequences, PepMLM fine-tunes the state-of-the-art ESM-2 pLM to fully reconstruct the binder region, achieving low perplexities matching or improving upon validated peptide-protein sequence pairs. After successful in silico benchmarking with AlphaFold-Multimer, outperforming RFDiffusion on structured targets, we experimentally verify PepMLM's efficacy via fusion of model-derived peptides to E3 ubiquitin ligase domains, demonstrating endogenous degradation of emergent viral phosphoproteins and Huntington's disease-driving proteins. In total, PepMLM enables the generative design of candidate binders to any target protein, without the requirement of target structure, empowering downstream therapeutic applications.

Abstract Image

Abstract Image

PepMLM:通过掩蔽语言建模的肽结合物的靶序列条件生成。
缺乏可接近的结合口袋和构象稳定性的靶蛋白对药物开发提出了越来越大的挑战。因此,诱导邻近策略,如PROTACs和分子胶,作为药理学替代品而受到关注,但仍需要小分子对接在结合口袋处进行靶向蛋白质降解(TPD)。基于蛋白质的粘合剂的计算设计提供了访问不可加固目标的独特机会,但通常依赖于稳定的3D结构或有效粘合剂生成的预测。最近,我们利用蛋白质语言模型(pLMs)的表达潜在空间,从单独的序列中对肽结合物进行优先排序,然后将其融合到E3泛素连接酶结构域,为靶蛋白创建了类似CRISPR的TPD系统。然而,我们的方法依赖于训练鉴别器模型来对启发式或无条件衍生的引导肽的靶结合能力进行排序。在这项工作中,我们介绍了PepMLM,一种线性肽结合物的纯靶序列条件从头生成器。通过采用一种新的掩蔽策略,将同源肽序列独特地定位在靶蛋白序列的末端,PepMLM使最先进的ESM-2pLM能够完全重建结合区,实现与先前验证的肽-蛋白序列对匹配或改进的低困惑性。在用AlphaFold Multimer成功进行计算机基准测试后,我们通过将模型衍生的肽与E3泛素连接酶结构域融合,实验验证了PepMLM的功效,证明了细胞模型中靶底物的内源性降解。总之,PepMLM能够在不需要靶结构的情况下,对任何靶蛋白的候选结合物进行生成性设计,从而实现下游可编程蛋白质组编辑应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信