S2ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning.

IF 10.7 1区 综合性期刊 Q1 Multidisciplinary
Research Pub Date : 2025-08-19 eCollection Date: 2025-01-01 DOI:10.34133/research.0721
Mingze Yin, Hanjing Zhou, Jialu Wu, Yiheng Zhu, Yuxuan Zhan, Zitai Kong, Hongxia Xu, Chang-Yu Hsieh, Jintai Chen, Tingjun Hou, Jian Wu
{"title":"S<sup>2</sup>ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning.","authors":"Mingze Yin, Hanjing Zhou, Jialu Wu, Yiheng Zhu, Yuxuan Zhan, Zitai Kong, Hongxia Xu, Chang-Yu Hsieh, Jintai Chen, Tingjun Hou, Jian Wu","doi":"10.34133/research.0721","DOIUrl":null,"url":null,"abstract":"<p><p>Antibodies safeguard our health through their precise and potent binding to specific antigens, demonstrating promising therapeutic efficacy in the treatment of numerous diseases, including COVID-19. Recent advancements in biomedical language models have shown the great potential to interpret complex biological structures and functions. However, existing antibody-specific models have a notable limitation that they lack explicit consideration for antibody structural information, despite the fact that both 1-dimensional sequence and 3-dimensional structure carry unique and complementary insights into antibody behavior and functionality. This paper proposes the <b>S</b>equence-<b>S</b>tructure multi-level pre-trained <b>A</b>ntibody <b>L</b>anguage <b>M</b>odel (S<sup>2</sup>ALM), combining holistic sequential and structural information in one unified, generic antibody foundation model. We construct a hierarchical pre-training paradigm incorporated with 2 customized multi-level training objectives to facilitate the modeling of comprehensive antibody representations. S<sup>2</sup>ALM's representation space uncovers inherent functional binding mechanisms, biological evolution properties, and structural interaction patterns. Pre-trained over 75 million sequences and 11.7 million structures, S<sup>2</sup>ALM can be adopted for diverse downstream tasks: accurately predicting antigen-antibody binding affinities, precisely distinguishing B cell maturation stages, identifying antibody crucial binding positions, and specifically designing novel coronavirus-binding antibodies. Remarkably, S<sup>2</sup>ALM outperforms well-established and renowned baselines and sets new state-of-the-art performance across extensive antibody-specific understanding and generation tasks. S<sup>2</sup>ALM's ability to model comprehensive and generalized representations further positions its potential to advance real-world therapeutic antibody development, potentially addressing unmet academic, industrial, and clinical needs.</p>","PeriodicalId":21120,"journal":{"name":"Research","volume":"8 ","pages":"0721"},"PeriodicalIF":10.7000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12364524/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.34133/research.0721","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

Abstract

Antibodies safeguard our health through their precise and potent binding to specific antigens, demonstrating promising therapeutic efficacy in the treatment of numerous diseases, including COVID-19. Recent advancements in biomedical language models have shown the great potential to interpret complex biological structures and functions. However, existing antibody-specific models have a notable limitation that they lack explicit consideration for antibody structural information, despite the fact that both 1-dimensional sequence and 3-dimensional structure carry unique and complementary insights into antibody behavior and functionality. This paper proposes the Sequence-Structure multi-level pre-trained Antibody Language Model (S2ALM), combining holistic sequential and structural information in one unified, generic antibody foundation model. We construct a hierarchical pre-training paradigm incorporated with 2 customized multi-level training objectives to facilitate the modeling of comprehensive antibody representations. S2ALM's representation space uncovers inherent functional binding mechanisms, biological evolution properties, and structural interaction patterns. Pre-trained over 75 million sequences and 11.7 million structures, S2ALM can be adopted for diverse downstream tasks: accurately predicting antigen-antibody binding affinities, precisely distinguishing B cell maturation stages, identifying antibody crucial binding positions, and specifically designing novel coronavirus-binding antibodies. Remarkably, S2ALM outperforms well-established and renowned baselines and sets new state-of-the-art performance across extensive antibody-specific understanding and generation tasks. S2ALM's ability to model comprehensive and generalized representations further positions its potential to advance real-world therapeutic antibody development, potentially addressing unmet academic, industrial, and clinical needs.

基于序列-结构预训练的抗体表征学习大语言模型。
抗体通过与特定抗原的精确和有效结合来保护我们的健康,在治疗包括COVID-19在内的许多疾病方面显示出有希望的治疗效果。生物医学语言模型的最新进展显示了解释复杂生物结构和功能的巨大潜力。然而,现有的抗体特异性模型有一个明显的局限性,即它们缺乏对抗体结构信息的明确考虑,尽管事实上一维序列和三维结构都对抗体的行为和功能具有独特和互补的见解。本文提出了序列-结构多级预训练抗体语言模型(S2ALM),将整体序列信息和结构信息结合在一个统一的通用抗体基础模型中。我们构建了一个包含2个定制的多层次训练目标的分层预训练范式,以促进全面抗体表征的建模。S2ALM的表示空间揭示了内在的功能结合机制、生物进化特性和结构相互作用模式。S2ALM预先训练了超过7500万个序列和1170万个结构,可用于多种下游任务:准确预测抗原-抗体结合亲和力,精确区分B细胞成熟阶段,识别抗体关键结合位置,以及专门设计新型冠状病毒结合抗体。值得注意的是,S2ALM超越了公认的著名基线,并在广泛的抗体特异性理解和生成任务中设置了新的最先进的性能。S2ALM的综合和广义表征建模能力进一步定位了其推进现实世界治疗性抗体开发的潜力,潜在地解决了未满足的学术、工业和临床需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Research
Research Multidisciplinary-Multidisciplinary
CiteScore
13.40
自引率
3.60%
发文量
0
审稿时长
14 weeks
期刊介绍: Research serves as a global platform for academic exchange, collaboration, and technological advancements. This journal welcomes high-quality research contributions from any domain, with open arms to authors from around the globe. Comprising fundamental research in the life and physical sciences, Research also highlights significant findings and issues in engineering and applied science. The journal proudly features original research articles, reviews, perspectives, and editorials, fostering a diverse and dynamic scholarly environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信