Justyna Mika, Alicja Polanska, Kim Rm Blenman, Lajos Pusztai, Joanna Polanska, Serge Candéias, Michal Marczyk
{"title":"A comprehensive evaluation of diversity measures for TCR repertoire profiling.","authors":"Justyna Mika, Alicja Polanska, Kim Rm Blenman, Lajos Pusztai, Joanna Polanska, Serge Candéias, Michal Marczyk","doi":"10.1186/s12915-025-02236-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>T cells play a crucial role in adaptive immunity, as they monitor internal and external immunogenic signals through their specific receptors (TCRs). Using high-throughput sequencing, one can assess TCR repertoire in various clinical settings and describe it quantitatively by calculating a diversity index. Multiple diversity indices that capture the richness of TCRs and the evenness of their distribution have been proposed in the literature; however, there is no consensus on gold-standard measures and interpretation of each index is complex. Our goal was to examine the performance characteristics of 12 commonly used diversity indices in simulated and real-world data.</p><p><strong>Results: </strong>Simulated data were generated to evaluate how data richness and evenness affect index values using three nonparametric models. Fourteen real-world TCR datasets were obtained to examine differences in indices by analysis protocols and test their robustness to subsampling. Pielou, Basharin, d50, and Gini primarily describe evenness and highly correlate with one another. They are best suited for measuring the representation of TCR clones. Richness is best captured by S index, next Chao1 and ACE which also consider information on evenness. Shannon, Inv.Simspon, D3, D4, and Gini.Simpson measure richness and increasingly more information on evenness. More skewed TCR distributions provided more stable results in subsampling. Gini-Simpson, Pielou, and Basharin were the most robust in both simulated and experimental data.</p><p><strong>Conclusions: </strong>Our results could guide investigators to select the best diversity index for their particular experimental question and draw attention to factors that can influence the accuracy and reproducibility of results.</p>","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"23 1","pages":"133"},"PeriodicalIF":4.4000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12080070/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-025-02236-5","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: T cells play a crucial role in adaptive immunity, as they monitor internal and external immunogenic signals through their specific receptors (TCRs). Using high-throughput sequencing, one can assess TCR repertoire in various clinical settings and describe it quantitatively by calculating a diversity index. Multiple diversity indices that capture the richness of TCRs and the evenness of their distribution have been proposed in the literature; however, there is no consensus on gold-standard measures and interpretation of each index is complex. Our goal was to examine the performance characteristics of 12 commonly used diversity indices in simulated and real-world data.
Results: Simulated data were generated to evaluate how data richness and evenness affect index values using three nonparametric models. Fourteen real-world TCR datasets were obtained to examine differences in indices by analysis protocols and test their robustness to subsampling. Pielou, Basharin, d50, and Gini primarily describe evenness and highly correlate with one another. They are best suited for measuring the representation of TCR clones. Richness is best captured by S index, next Chao1 and ACE which also consider information on evenness. Shannon, Inv.Simspon, D3, D4, and Gini.Simpson measure richness and increasingly more information on evenness. More skewed TCR distributions provided more stable results in subsampling. Gini-Simpson, Pielou, and Basharin were the most robust in both simulated and experimental data.
Conclusions: Our results could guide investigators to select the best diversity index for their particular experimental question and draw attention to factors that can influence the accuracy and reproducibility of results.
期刊介绍:
BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.