Contrastive learning of T cell receptor representations.

Cell systems Pub Date : 2025-01-15 Epub Date: 2025-01-07 DOI:10.1016/j.cels.2024.12.006

Yuta Nagano, Andrew G T Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer

{"title":"Contrastive learning of T cell receptor representations.","authors":"Yuta Nagano, Andrew G T Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer","doi":"10.1016/j.cels.2024.12.006","DOIUrl":null,"url":null,"abstract":"<p><p>Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"101165"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2024.12.006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper's transparent peer review process is included in the supplemental information.

查看原文本刊更多论文

T细胞受体表征的对比学习。

T细胞受体（TCRs）及其配体相互作用的计算预测是免疫学领域的一大挑战。尽管在高通量分析方面取得了进展，特异性标记的TCR数据仍然稀少。在其他领域，对未标记数据的语言模型进行预训练已被成功地用于解决数据瓶颈。然而，目前尚不清楚如何最好地预训练蛋白质语言模型来预测TCR特异性。在这里，我们介绍了一种称为SCEPTR （T细胞受体初级序列的简单对比嵌入）的TCR语言模型，该模型能够进行数据高效的迁移学习。通过我们的模型，我们引入了一种结合自对比学习和屏蔽语言建模的预训练策略，使SCEPTR能够达到其最先进的性能。相比之下，基于序列比对的方法优于现有的蛋白质语言模型和未经自对比学习预训练的SCEPTR变体。我们预计，对比学习将是解码TCR特异性规则的有用范式。本文的透明同行评议过程记录包含在补充信息中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cell systems

自引率

0.00%

发文量