Multi-positive contrastive learning-based cross-attention model for T cell receptor–antigen binding prediction

IF 4.9 2区 医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Yi Shuai , Pengcheng Shen , Xianrui Zhang
{"title":"Multi-positive contrastive learning-based cross-attention model for T cell receptor–antigen binding prediction","authors":"Yi Shuai ,&nbsp;Pengcheng Shen ,&nbsp;Xianrui Zhang","doi":"10.1016/j.cmpb.2025.108797","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>T cells play a vital role in the immune system by recognizing and eliminating infected or cancerous cells, thus driving adaptive immune responses. Their activation is triggered by the binding of T cell receptors (TCRs) to epitopes presented on Major Histocompatibility Complex (MHC) molecules. However, experimentally identifying antigens that could be recognizable by T cells and possess immunogenic properties is resource-intensive, with most candidates proving non-immunogenic, underscoring the need for computational tools to predict peptide-MHC (pMHC) and TCR binding. Despite extensive efforts, accurately predicting TCR-antigen binding pairs remains challenging due to the vast diversity of TCRs.</div></div><div><h3>Methods:</h3><div>In this study, we propose a Contrastive Cross-attention model for TCR (ConTCR) and pMHC binding prediction. Firstly, the pMHC and TCR sequences are transformed into high-level embedding by pretrained encoders as feature representations. Then, we employ the multi-modal cross-attention to combine the features between pMHC sequences and TCR sequences. Next, based on the contrastive learning strategy, we pretrained the backbone of ConTCR to boost the model’s feature extraction ability for pMHC and TCR sequences. Finally, the model is fine-tuned for classification between positive and negative samples.</div></div><div><h3>Results:</h3><div>Based on this advanced strategy, our proposed model could effectively capture the critical information on TCR-pMHC interactions, and the model is visualized by the attention score heatmap for interpretability. ConTCR demonstrates strong generalization in predicting binding specificity for unseen epitopes and diverse TCR repertoires. On independent non-zero-shot test sets, the model achieved AUC-ROC scores of 0.849 and 0.950; on zero-shot test sets, it obtained AUC-ROC scores of 0.830 and 0.938.</div></div><div><h3>Conclusion:</h3><div>Our framework offers a promising solution for improving pMHC-TCR binding prediction and model interpretability. By leveraging the ConTCR model and pMHC-TCR features, we achieve more precise precision than recently advanced models. Overall, ConTCR is a robust tool for predicting pMHC-TCR binding and holds significant promise to advance TCR-based immunotherapies as a valuable artificial intelligence tool. The codes and data used in this study are available at this <span><span>website</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"268 ","pages":"Article 108797"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002147","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Background and Objective:

T cells play a vital role in the immune system by recognizing and eliminating infected or cancerous cells, thus driving adaptive immune responses. Their activation is triggered by the binding of T cell receptors (TCRs) to epitopes presented on Major Histocompatibility Complex (MHC) molecules. However, experimentally identifying antigens that could be recognizable by T cells and possess immunogenic properties is resource-intensive, with most candidates proving non-immunogenic, underscoring the need for computational tools to predict peptide-MHC (pMHC) and TCR binding. Despite extensive efforts, accurately predicting TCR-antigen binding pairs remains challenging due to the vast diversity of TCRs.

Methods:

In this study, we propose a Contrastive Cross-attention model for TCR (ConTCR) and pMHC binding prediction. Firstly, the pMHC and TCR sequences are transformed into high-level embedding by pretrained encoders as feature representations. Then, we employ the multi-modal cross-attention to combine the features between pMHC sequences and TCR sequences. Next, based on the contrastive learning strategy, we pretrained the backbone of ConTCR to boost the model’s feature extraction ability for pMHC and TCR sequences. Finally, the model is fine-tuned for classification between positive and negative samples.

Results:

Based on this advanced strategy, our proposed model could effectively capture the critical information on TCR-pMHC interactions, and the model is visualized by the attention score heatmap for interpretability. ConTCR demonstrates strong generalization in predicting binding specificity for unseen epitopes and diverse TCR repertoires. On independent non-zero-shot test sets, the model achieved AUC-ROC scores of 0.849 and 0.950; on zero-shot test sets, it obtained AUC-ROC scores of 0.830 and 0.938.

Conclusion:

Our framework offers a promising solution for improving pMHC-TCR binding prediction and model interpretability. By leveraging the ConTCR model and pMHC-TCR features, we achieve more precise precision than recently advanced models. Overall, ConTCR is a robust tool for predicting pMHC-TCR binding and holds significant promise to advance TCR-based immunotherapies as a valuable artificial intelligence tool. The codes and data used in this study are available at this website.
基于多阳性对比学习的T细胞受体-抗原结合预测交叉注意模型
背景与目的:T细胞在免疫系统中起着至关重要的作用,通过识别和清除感染细胞或癌细胞,从而驱动适应性免疫反应。它们的激活是由T细胞受体(TCRs)与主要组织相容性复合体(MHC)分子上的表位结合而触发的。然而,通过实验鉴定能够被T细胞识别并具有免疫原性的抗原是一项资源密集型的工作,大多数候选抗原被证明是非免疫原性的,这强调了预测多肽- mhc (pMHC)和TCR结合的计算工具的必要性。尽管经过了广泛的努力,但由于tcr的巨大多样性,准确预测tcr -抗原结合对仍然具有挑战性。方法:在本研究中,我们提出了TCR (ConTCR)和pMHC结合预测的对比交叉注意模型。首先,通过预训练的编码器将pMHC和TCR序列转换为高级嵌入作为特征表示。然后,我们采用多模态交叉关注将pMHC序列和TCR序列的特征结合起来。其次,基于对比学习策略,对ConTCR的主干进行预训练,提高模型对pMHC和TCR序列的特征提取能力。最后,对模型进行微调,以便在正样本和负样本之间进行分类。结果:基于这一先进策略,我们提出的模型可以有效地捕获TCR-pMHC相互作用的关键信息,并通过注意得分热图将模型可视化,以提高模型的可解释性。ConTCR在预测未见表位和不同TCR谱的结合特异性方面具有很强的通用性。在独立的非零次检验集上,模型的AUC-ROC得分分别为0.849和0.950;在零次测试集上,AUC-ROC得分分别为0.830和0.938。结论:我们的框架为改善pMHC-TCR结合预测和模型可解释性提供了一个有希望的解决方案。通过利用ConTCR模型和pMHC-TCR功能,我们实现了比最近先进模型更精确的精度。总的来说,ConTCR是预测pMHC-TCR结合的强大工具,并有望作为一种有价值的人工智能工具推进基于tcr的免疫疗法。本研究中使用的代码和数据可在本网站获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer methods and programs in biomedicine
Computer methods and programs in biomedicine 工程技术-工程:生物医学
CiteScore
12.30
自引率
6.60%
发文量
601
审稿时长
135 days
期刊介绍: To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信