Linguistics-based formalization of the antibody language as a basis for antibody language models

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff
{"title":"Linguistics-based formalization of the antibody language as a basis for antibody language models","authors":"Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff","doi":"10.1038/s43588-024-00642-3","DOIUrl":null,"url":null,"abstract":"Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 6","pages":"412-422"},"PeriodicalIF":12.0000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-024-00642-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.

Abstract Image

基于语言学的抗体语言形式化是抗体语言模型的基础。
自然语言与抗体序列之间的明显相似性导致了将深度语言模型应用于抗体序列以预测同源抗原识别的热潮。然而,抗体语言的语言学形式定义并不存在,而且对抗体语言模型如何捕捉抗体特异性结合特征的深入了解在很大程度上仍无法解读。在此,我们将介绍如何通过表征抗体语言的词块和语法,对抗体语言进行语言形式化,从而解决目前在抗体语言模型规则挖掘方面所面临的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信