A comprehensive evaluation of diversity measures for TCR repertoire profiling.

IF 4.5 1区生物学 Q1 BIOLOGY

BMC Biology Pub Date : 2025-05-14 DOI:10.1186/s12915-025-02236-5

Justyna Mika, Alicja Polanska, Kim Rm Blenman, Lajos Pusztai, Joanna Polanska, Serge Candéias, Michal Marczyk

{"title":"A comprehensive evaluation of diversity measures for TCR repertoire profiling.","authors":"Justyna Mika, Alicja Polanska, Kim Rm Blenman, Lajos Pusztai, Joanna Polanska, Serge Candéias, Michal Marczyk","doi":"10.1186/s12915-025-02236-5","DOIUrl":null,"url":null,"abstract":"Background: T cells play a crucial role in adaptive immunity, as they monitor internal and external immunogenic signals through their specific receptors (TCRs). Using high-throughput sequencing, one can assess TCR repertoire in various clinical settings and describe it quantitatively by calculating a diversity index. Multiple diversity indices that capture the richness of TCRs and the evenness of their distribution have been proposed in the literature; however, there is no consensus on gold-standard measures and interpretation of each index is complex. Our goal was to examine the performance characteristics of 12 commonly used diversity indices in simulated and real-world data.Results: Simulated data were generated to evaluate how data richness and evenness affect index values using three nonparametric models. Fourteen real-world TCR datasets were obtained to examine differences in indices by analysis protocols and test their robustness to subsampling. Pielou, Basharin, d50, and Gini primarily describe evenness and highly correlate with one another. They are best suited for measuring the representation of TCR clones. Richness is best captured by S index, next Chao1 and ACE which also consider information on evenness. Shannon, Inv.Simspon, D3, D4, and Gini.Simpson measure richness and increasingly more information on evenness. More skewed TCR distributions provided more stable results in subsampling. Gini-Simpson, Pielou, and Basharin were the most robust in both simulated and experimental data.Conclusions: Our results could guide investigators to select the best diversity index for their particular experimental question and draw attention to factors that can influence the accuracy and reproducibility of results.","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"23 1","pages":"133"},"PeriodicalIF":4.5000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12080070/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-025-02236-5","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: T cells play a crucial role in adaptive immunity, as they monitor internal and external immunogenic signals through their specific receptors (TCRs). Using high-throughput sequencing, one can assess TCR repertoire in various clinical settings and describe it quantitatively by calculating a diversity index. Multiple diversity indices that capture the richness of TCRs and the evenness of their distribution have been proposed in the literature; however, there is no consensus on gold-standard measures and interpretation of each index is complex. Our goal was to examine the performance characteristics of 12 commonly used diversity indices in simulated and real-world data.

Results: Simulated data were generated to evaluate how data richness and evenness affect index values using three nonparametric models. Fourteen real-world TCR datasets were obtained to examine differences in indices by analysis protocols and test their robustness to subsampling. Pielou, Basharin, d50, and Gini primarily describe evenness and highly correlate with one another. They are best suited for measuring the representation of TCR clones. Richness is best captured by S index, next Chao1 and ACE which also consider information on evenness. Shannon, Inv.Simspon, D3, D4, and Gini.Simpson measure richness and increasingly more information on evenness. More skewed TCR distributions provided more stable results in subsampling. Gini-Simpson, Pielou, and Basharin were the most robust in both simulated and experimental data.

Conclusions: Our results could guide investigators to select the best diversity index for their particular experimental question and draw attention to factors that can influence the accuracy and reproducibility of results.

查看原文本刊更多论文

TCR曲目分析的多样性措施的综合评价。

背景：T细胞在适应性免疫中起着至关重要的作用，因为它们通过其特异性受体（tcr）监测内部和外部免疫原性信号。使用高通量测序，人们可以在不同的临床环境中评估TCR曲目，并通过计算多样性指数定量地描述它。文献中提出了多种多样性指标来反映tcr的丰富度和分布均匀性；然而，对金本位的衡量标准没有达成共识，对每个指标的解释也很复杂。我们的目标是研究12个常用的多样性指数在模拟和现实世界数据中的性能特征。结果：生成模拟数据，使用三种非参数模型评估数据丰富度和均匀度如何影响指标值。获得14个真实世界的TCR数据集，通过分析协议检查指标的差异，并测试其对子抽样的稳健性。Pielou, Basharin， d50和Gini主要描述了均匀性，并且彼此之间高度相关。它们最适合于测量TCR克隆的代表性。丰富度最好用S指数，其次是Chao1和ACE，它们也考虑了均匀度的信息。Shannon, iv . simspon, D3， D4和Gini。辛普森测量丰富度和越来越多的关于均匀度的信息。TCR分布越偏，子抽样结果越稳定。Gini-Simpson， Pielou和Basharin在模拟和实验数据中都是最稳健的。结论：本研究结果可以指导研究者针对其特定的实验问题选择最佳的多样性指数，并引起对影响结果准确性和重复性的因素的关注。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Biology 生物-生物学

CiteScore

7.80

自引率

1.90%

发文量

260

审稿时长

3 months

期刊介绍： BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.