Pre-training with a rational approach for antibody sequence representation.

IF 5.7 2区 医学 Q1 IMMUNOLOGY
Frontiers in Immunology Pub Date : 2024-10-23 eCollection Date: 2024-01-01 DOI:10.3389/fimmu.2024.1468599
Xiangrui Gao, Changling Cao, Chenfeng He, Lipeng Lai
{"title":"Pre-training with a rational approach for antibody sequence representation.","authors":"Xiangrui Gao, Changling Cao, Chenfeng He, Lipeng Lai","doi":"10.3389/fimmu.2024.1468599","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Antibodies represent a specific class of proteins produced by the adaptive immune system in response to pathogens. Mining the information embedded in antibody amino acid sequences can benefit both antibody property prediction and novel therapeutic development. However, antibodies possess unique features that should be incorporated using specifically designed training methods, leaving room for improvement in pre-training models for antibody sequences.</p><p><strong>Methods: </strong>In this study, we present a Pre-trained model of Antibody sequences trained with a Rational Approach for antibodies (PARA). PARA employs a strategy conforming to antibody sequence patterns and an advanced natural language processing self-encoding model structure. This approach addresses the limitations of existing protein pre-training models, which primarily utilize language models without fully considering the differences between protein sequences and language sequences.</p><p><strong>Results: </strong>We demonstrate PARA's performance on several tasks by comparing it to various published pre-training models of antibodies. The results show that PARA significantly outperforms existing models on these tasks, suggesting that PARA has an advantage in capturing antibody sequence information.</p><p><strong>Discussion: </strong>The antibody latent representation provided by PARA can substantially facilitate studies in relevant areas. We believe that PARA's superior performance in capturing antibody sequence information offers significant potential for both antibody property prediction and the development of novel therapeutics. PARA is available at https://github.com/xtalpi-xic.</p>","PeriodicalId":12622,"journal":{"name":"Frontiers in Immunology","volume":null,"pages":null},"PeriodicalIF":5.7000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11537868/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Immunology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fimmu.2024.1468599","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Antibodies represent a specific class of proteins produced by the adaptive immune system in response to pathogens. Mining the information embedded in antibody amino acid sequences can benefit both antibody property prediction and novel therapeutic development. However, antibodies possess unique features that should be incorporated using specifically designed training methods, leaving room for improvement in pre-training models for antibody sequences.

Methods: In this study, we present a Pre-trained model of Antibody sequences trained with a Rational Approach for antibodies (PARA). PARA employs a strategy conforming to antibody sequence patterns and an advanced natural language processing self-encoding model structure. This approach addresses the limitations of existing protein pre-training models, which primarily utilize language models without fully considering the differences between protein sequences and language sequences.

Results: We demonstrate PARA's performance on several tasks by comparing it to various published pre-training models of antibodies. The results show that PARA significantly outperforms existing models on these tasks, suggesting that PARA has an advantage in capturing antibody sequence information.

Discussion: The antibody latent representation provided by PARA can substantially facilitate studies in relevant areas. We believe that PARA's superior performance in capturing antibody sequence information offers significant potential for both antibody property prediction and the development of novel therapeutics. PARA is available at https://github.com/xtalpi-xic.

采用合理方法对抗体序列表示进行预训练。
简介抗体是适应性免疫系统针对病原体产生的一类特殊蛋白质。挖掘抗体氨基酸序列中蕴含的信息既有利于抗体特性预测,也有利于新疗法的开发。然而,抗体具有独特的特征,应使用专门设计的训练方法将其纳入其中,因此抗体序列的预训练模型还有改进的余地:在这项研究中,我们提出了一种用抗体合理方法(PARA)训练的抗体序列预训练模型。PARA 采用了符合抗体序列模式的策略和先进的自然语言处理自编码模型结构。这种方法解决了现有蛋白质预训练模型的局限性,这些模型主要利用语言模型,而没有充分考虑蛋白质序列与语言序列之间的差异:结果:我们通过将 PARA 与各种已发表的抗体预训练模型进行比较,证明了 PARA 在多项任务中的表现。结果表明,PARA 在这些任务中的表现明显优于现有模型,这表明 PARA 在捕捉抗体序列信息方面具有优势:讨论:PARA 提供的抗体潜在表征能极大地促进相关领域的研究。我们相信,PARA 在捕捉抗体序列信息方面的卓越表现为抗体特性预测和新型疗法的开发提供了巨大潜力。PARA 可在 https://github.com/xtalpi-xic 上查阅。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.80
自引率
11.00%
发文量
7153
审稿时长
14 weeks
期刊介绍: Frontiers in Immunology is a leading journal in its field, publishing rigorously peer-reviewed research across basic, translational and clinical immunology. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide. Frontiers in Immunology is the official Journal of the International Union of Immunological Societies (IUIS). Encompassing the entire field of Immunology, this journal welcomes papers that investigate basic mechanisms of immune system development and function, with a particular emphasis given to the description of the clinical and immunological phenotype of human immune disorders, and on the definition of their molecular basis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信