Analyzing Immunomes using Sequence Embedding and Network Analysis

Kristina Motuzenko, Ilya Makarov
{"title":"Analyzing Immunomes using Sequence Embedding and Network Analysis","authors":"Kristina Motuzenko, Ilya Makarov","doi":"10.1109/SAMI58000.2023.10044509","DOIUrl":null,"url":null,"abstract":"The adaptive immune system helps us to resist the spread and eliminate potentially dangerous pathogens from an organism. This specific activity to pathogens is achieved through immune receptors such as B cell receptors (BCR) and their secreted version antibodies and T cell receptors (TCR). The sum of B cell and T cell receptors is the immune repertoire. The immune repertoire is the language of the immune system and its study would reveal information about antigen specificity, immune history, immune status, and therapeutics. CDR3 is the most variable part of TCRβ and makes the most significant contribution to the binding to the antigen. There are many Machine Learning and Deep Learning approaches for immune repertoire analysis. But the studies with attention-based model implementations lack. We take amino acid sequences of CDR3 fragments from TCRβ and use them as text for a state-of-the-art NLP model for the classification task. We choose ProtBert to obtain embeddings for a further downstream task to classify immune status. Using the BERT model did not yield significant results compared to the simpler model. Such a contradictory result can be explained by the fact that the attention mechanism could not catch high-level dependencies in sequences, and it is enough to use differences at the n-gram level to get a relatively good result.","PeriodicalId":179029,"journal":{"name":"2023 IEEE 21st World Symposium on Applied Machine Intelligence and Informatics (SAMI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 21st World Symposium on Applied Machine Intelligence and Informatics (SAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMI58000.2023.10044509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The adaptive immune system helps us to resist the spread and eliminate potentially dangerous pathogens from an organism. This specific activity to pathogens is achieved through immune receptors such as B cell receptors (BCR) and their secreted version antibodies and T cell receptors (TCR). The sum of B cell and T cell receptors is the immune repertoire. The immune repertoire is the language of the immune system and its study would reveal information about antigen specificity, immune history, immune status, and therapeutics. CDR3 is the most variable part of TCRβ and makes the most significant contribution to the binding to the antigen. There are many Machine Learning and Deep Learning approaches for immune repertoire analysis. But the studies with attention-based model implementations lack. We take amino acid sequences of CDR3 fragments from TCRβ and use them as text for a state-of-the-art NLP model for the classification task. We choose ProtBert to obtain embeddings for a further downstream task to classify immune status. Using the BERT model did not yield significant results compared to the simpler model. Such a contradictory result can be explained by the fact that the attention mechanism could not catch high-level dependencies in sequences, and it is enough to use differences at the n-gram level to get a relatively good result.
利用序列嵌入和网络分析分析免疫组
适应性免疫系统帮助我们抵抗传播,并从生物体中消除潜在的危险病原体。这种对病原体的特异性活性是通过免疫受体,如B细胞受体(BCR)及其分泌的抗体和T细胞受体(TCR)来实现的。B细胞和T细胞受体的总和是免疫库。免疫库是免疫系统的语言,它的研究将揭示抗原特异性、免疫史、免疫状态和治疗方法的信息。CDR3是TCRβ中最易变的部分,对与抗原结合的贡献最大。有许多机器学习和深度学习方法用于免疫库分析。但是关于基于注意的模型实现的研究还很缺乏。我们从TCRβ中提取CDR3片段的氨基酸序列,并将其作为文本用于分类任务的最先进的NLP模型。我们选择ProtBert来获得进一步下游任务的嵌入,以分类免疫状态。与更简单的模型相比,使用BERT模型并没有产生显著的结果。这种矛盾的结果可以解释为注意机制无法捕捉序列中的高级依赖关系,使用n-gram级别的差异就足以获得相对较好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信