deepNGS navigator: exploring antibody NGS datasets using deep contrastive learning.

IF 5.4
Homa MohammadiPeyhani, Edith Lee, Richard Bonneau, Vladimir Gligorijevic, Jae Hyeon Lee
{"title":"deepNGS navigator: exploring antibody NGS datasets using deep contrastive learning.","authors":"Homa MohammadiPeyhani, Edith Lee, Richard Bonneau, Vladimir Gligorijevic, Jae Hyeon Lee","doi":"10.1093/bioinformatics/btaf414","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>High-throughput sequencing uncovers how B-cells adapt in response to antigens by generating B-cell-receptor (BCR) sequences at an unprecedented scale. As BCR datasets grow to millions of sequences, using efficient computational methods becomes crucial. One important aspect of antibody sequence analysis is detecting clonal families or clusters of related sequences, whether they come from immunization, synthetic-libraries or even ML-generated datasets.</p><p><strong>Results: </strong>We introduce deepNGS Navigator, a computational tool that leverages language models and contrastive learning to transform antibody sequences into intuitive 2D representations. The resulting 2D maps offer a visualization of overall diversity of input datasets, which can be clustered based on the sequence distances and their densities across the map. Beyond grouping related sequences, the 2D maps also represent mutational patterns inferred from sequence embeddings, enabling trajectory analysis and clustering within the projected space. By overlaying properties such as charge, the map helps identify clusters of interest for further investigation while also flagging potentially noisy or non-specific sequences with higher risk. We demonstrate deepNGS Navigator's utilities on several datasets, including: (i) a synthetic-library from a yeast-display targeting HER2, (ii) a machine learning-generated dataset with a hierarchical structure, (iii) NGS sequences from a llama immunized against COVID RBD, (iv) human naive and memory B-cell sequences, and (v) an in silico dataset simulating B-cell clonal lineages.</p><p><strong>Availability and implementation: </strong>The deepNGS Navigator source code is available at: github.com/prescient-design/deepngs-navigator and github.com/prescient-design/deepngs-navigator-panel-app.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448221/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: High-throughput sequencing uncovers how B-cells adapt in response to antigens by generating B-cell-receptor (BCR) sequences at an unprecedented scale. As BCR datasets grow to millions of sequences, using efficient computational methods becomes crucial. One important aspect of antibody sequence analysis is detecting clonal families or clusters of related sequences, whether they come from immunization, synthetic-libraries or even ML-generated datasets.

Results: We introduce deepNGS Navigator, a computational tool that leverages language models and contrastive learning to transform antibody sequences into intuitive 2D representations. The resulting 2D maps offer a visualization of overall diversity of input datasets, which can be clustered based on the sequence distances and their densities across the map. Beyond grouping related sequences, the 2D maps also represent mutational patterns inferred from sequence embeddings, enabling trajectory analysis and clustering within the projected space. By overlaying properties such as charge, the map helps identify clusters of interest for further investigation while also flagging potentially noisy or non-specific sequences with higher risk. We demonstrate deepNGS Navigator's utilities on several datasets, including: (i) a synthetic-library from a yeast-display targeting HER2, (ii) a machine learning-generated dataset with a hierarchical structure, (iii) NGS sequences from a llama immunized against COVID RBD, (iv) human naive and memory B-cell sequences, and (v) an in silico dataset simulating B-cell clonal lineages.

Availability and implementation: The deepNGS Navigator source code is available at: github.com/prescient-design/deepngs-navigator and github.com/prescient-design/deepngs-navigator-panel-app.

deepNGS Navigator:使用深度对比学习探索抗体NGS数据集。
动机:高通量测序揭示了b细胞如何适应抗原,以前所未有的规模产生b细胞受体(BCR)序列。随着BCR数据集增长到数百万序列,使用高效的计算方法变得至关重要。抗体序列分析的一个重要方面是检测克隆家族或相关序列簇,无论它们是来自免疫,合成文库还是ml生成的数据集。结果:我们引入了deepNGS Navigator,这是一种利用语言模型和对比学习将抗体序列转换为直观的2D表示的计算工具。由此产生的2D地图提供了输入数据集整体多样性的可视化,可以根据序列距离和它们在地图上的密度对其进行聚类。除了对相关序列进行分组之外,2D地图还表示从序列嵌入推断的突变模式,从而可以在投影空间内进行轨迹分析和聚类。通过叠加电荷等属性,该图谱有助于识别感兴趣的群集以进行进一步研究,同时也标记出潜在的噪声或高风险的非特异性序列。我们在几个数据集上展示了deepNGS Navigator的实用程序,包括:1)针对HER2的酵母显示合成库,2)具有分层结构的机器学习生成数据集,3)免疫了COVID - RBD的羊羊的NGS序列,4)人类原始和记忆b细胞序列,以及5)模拟b细胞克隆谱系的计算机数据集。可用性和实现:deepNGS Navigator源代码可在:github.com/prescient-design/deepngs-navigator和github.com/prescient-design/deepngs-navigator-panel-app.Supplementary上获得信息:补充数据,包括实现细节和其他数据,可在网上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信