Negative binomial mixture model for identification of noise in antibody-antigen specificity predictions from single-cell data.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2024-12-04 eCollection Date: 2024-01-01 DOI:10.1093/bioadv/vbae170

Perry T Wasdin, Alexandra A Abu-Shmais, Michael W Irvin, Matthew J Vukovich, Ivelin S Georgiev

{"title":"Negative binomial mixture model for identification of noise in antibody-antigen specificity predictions from single-cell data.","authors":"Perry T Wasdin, Alexandra A Abu-Shmais, Michael W Irvin, Matthew J Vukovich, Ivelin S Georgiev","doi":"10.1093/bioadv/vbae170","DOIUrl":null,"url":null,"abstract":"Motivation: LIBRA-seq (linking B cell receptor to antigen specificity by sequencing) provides a powerful tool for interrogating the antigen-specific B cell compartment and identifying antibodies against antigen targets of interest. Identification of noise in single-cell B cell receptor sequencing data, such as LIBRA-seq, is critical for improving antigen binding predictions for downstream applications including antibody discovery and machine learning technologies.Results: In this study, we present a method for denoising LIBRA-seq data by clustering antigen counts into signal and noise components with a negative binomial mixture model. This approach leverages single-cell sequencing reads from a large, multi-donor dataset described in a recent LIBRA-seq study to develop a data-driven means for identification of technical noise. We apply this method to nine donors representing separate LIBRA-seq experiments and show that our approach provides improved predictions for in vitro antibody-antigen binding when compared to the standard scoring method, despite variance in data size and noise structure across samples. This development will improve the ability of LIBRA-seq to identify antigen-specific B cells and contribute to providing more reliable datasets for machine learning based approaches as the corpus of single-cell B cell sequencing data continues to grow.Availability and implementation: All data and code are available at https://github.com/IGlab-VUMC/mixture_model_denoising.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae170"},"PeriodicalIF":2.4000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631427/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: LIBRA-seq (linking B cell receptor to antigen specificity by sequencing) provides a powerful tool for interrogating the antigen-specific B cell compartment and identifying antibodies against antigen targets of interest. Identification of noise in single-cell B cell receptor sequencing data, such as LIBRA-seq, is critical for improving antigen binding predictions for downstream applications including antibody discovery and machine learning technologies.

Results: In this study, we present a method for denoising LIBRA-seq data by clustering antigen counts into signal and noise components with a negative binomial mixture model. This approach leverages single-cell sequencing reads from a large, multi-donor dataset described in a recent LIBRA-seq study to develop a data-driven means for identification of technical noise. We apply this method to nine donors representing separate LIBRA-seq experiments and show that our approach provides improved predictions for in vitro antibody-antigen binding when compared to the standard scoring method, despite variance in data size and noise structure across samples. This development will improve the ability of LIBRA-seq to identify antigen-specific B cells and contribute to providing more reliable datasets for machine learning based approaches as the corpus of single-cell B cell sequencing data continues to grow.

Availability and implementation: All data and code are available at https://github.com/IGlab-VUMC/mixture_model_denoising.

查看原文本刊更多论文

从单细胞数据中识别抗体-抗原特异性预测噪声的负二项混合模型。

动机：LIBRA-seq（通过测序将B细胞受体与抗原特异性连接起来）为询问抗原特异性B细胞区室和识别针对感兴趣抗原靶点的抗体提供了强大的工具。识别单细胞B细胞受体测序数据中的噪声，如LIBRA-seq，对于改善下游应用的抗原结合预测至关重要，包括抗体发现和机器学习技术。结果：在本研究中，我们提出了一种通过负二项混合模型将抗原计数聚类为信号和噪声分量的方法来对LIBRA-seq数据进行去噪。该方法利用最近LIBRA-seq研究中描述的大型多供体数据集的单细胞测序读数，开发出一种数据驱动的技术噪声识别方法。我们将这种方法应用于代表独立LIBRA-seq实验的9个供体，结果表明，尽管样本之间的数据大小和噪声结构存在差异，但与标准评分方法相比，我们的方法提供了体外抗体-抗原结合的改进预测。随着单细胞B细胞测序数据的不断增长，这一发展将提高LIBRA-seq识别抗原特异性B细胞的能力，并有助于为基于机器学习的方法提供更可靠的数据集。可用性和实现：所有数据和代码可在https://github.com/IGlab-VUMC/mixture_model_denoising上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量