Multi-Modal CLIP-Informed Protein Editing.

Health data science Pub Date : 2024-12-19 eCollection Date: 2024-01-01 DOI:10.34133/hds.0211

Mingze Yin, Hanjing Zhou, Yiheng Zhu, Miao Lin, Yixuan Wu, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jintai Chen, Jian Wu

{"title":"Multi-Modal CLIP-Informed Protein Editing.","authors":"Mingze Yin, Hanjing Zhou, Yiheng Zhu, Miao Lin, Yixuan Wu, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jintai Chen, Jian Wu","doi":"10.34133/hds.0211","DOIUrl":null,"url":null,"abstract":"Background: Proteins govern most biological functions essential for life, and achieving controllable protein editing has made great advances in probing natural systems, creating therapeutic conjugates, and generating novel protein constructs. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback. Methods: To fill these gaps, we propose a novel method called ProtET for efficient CLIP-informed protein editing through multi-modality learning. Our approach comprises 2 stages: In the pretraining stage, contrastive learning aligns protein-biotext representations encoded by 2 large language models (LLMs). Subsequently, during the protein editing stage, the fused features from editing instruction texts and original protein sequences serve as the final editing condition for generating target protein sequences. Results: Comprehensive experiments demonstrated the superiority of ProtET in editing proteins to enhance human-expected functionality across multiple attribute domains, including enzyme catalytic activity, protein stability, and antibody-specific binding ability. ProtET improves the state-of-the-art results by a large margin, leading to substantial stability improvements of 16.67% and 16.90%. Conclusions: This capability positions ProtET to advance real-world artificial protein editing, potentially addressing unmet academic, industrial, and clinical needs.","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0211"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658819/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Proteins govern most biological functions essential for life, and achieving controllable protein editing has made great advances in probing natural systems, creating therapeutic conjugates, and generating novel protein constructs. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback. Methods: To fill these gaps, we propose a novel method called ProtET for efficient CLIP-informed protein editing through multi-modality learning. Our approach comprises 2 stages: In the pretraining stage, contrastive learning aligns protein-biotext representations encoded by 2 large language models (LLMs). Subsequently, during the protein editing stage, the fused features from editing instruction texts and original protein sequences serve as the final editing condition for generating target protein sequences. Results: Comprehensive experiments demonstrated the superiority of ProtET in editing proteins to enhance human-expected functionality across multiple attribute domains, including enzyme catalytic activity, protein stability, and antibody-specific binding ability. ProtET improves the state-of-the-art results by a large margin, leading to substantial stability improvements of 16.67% and 16.90%. Conclusions: This capability positions ProtET to advance real-world artificial protein editing, potentially addressing unmet academic, industrial, and clinical needs.

Abstract Image

查看原文本刊更多论文

多模态剪辑通知蛋白质编辑。

背景：蛋白质控制着生命所必需的大多数生物功能，实现可控的蛋白质编辑在探测自然系统、创造治疗偶联物和产生新的蛋白质结构方面取得了巨大进展。最近，机器学习辅助蛋白质编辑（MLPE）在加速优化周期和减少实验工作量方面显示出了希望。然而，目前的方法与潜在的蛋白质编辑的巨大组合空间作斗争，并且不能使用生物文本指令明确地进行蛋白质编辑，限制了它们与人类反馈的交互性。方法：为了填补这些空白，我们提出了一种名为ProtET的新方法，通过多模态学习对clip进行有效的蛋白质编辑。我们的方法包括两个阶段：在预训练阶段，对比学习对齐由两个大型语言模型（llm）编码的蛋白质-生物文本表示。随后，在蛋白质编辑阶段，编辑指令文本与原始蛋白质序列的融合特征作为生成目标蛋白质序列的最终编辑条件。结果：综合实验证明了ProtET在编辑蛋白质方面的优势，可以增强人类期望的跨多个属性域的功能，包括酶催化活性、蛋白质稳定性和抗体特异性结合能力。ProtET在很大程度上提高了最先进的结果，导致稳定性提高了16.67%和16.90%。结论：这种能力使ProtET能够推进现实世界的人工蛋白质编辑，潜在地解决未满足的学术、工业和临床需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Health data science

CiteScore

3.70

自引率

0.00%

发文量