Extending gene set variation analysis with a reference dataset to stabilize scores.

IF 3.7 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Lorin Towle-Miller, William Jordan, Alexandre Lockhart, Johannes Freudenburg, Aman Virmani, Mandy Bergquist, Jeffrey Miecznikowski, Will Powley
{"title":"Extending gene set variation analysis with a reference dataset to stabilize scores.","authors":"Lorin Towle-Miller, William Jordan, Alexandre Lockhart, Johannes Freudenburg, Aman Virmani, Mandy Bergquist, Jeffrey Miecznikowski, Will Powley","doi":"10.1186/s12864-025-11769-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Biological pathways are sets of genes that jointly drive biological processes. Rather than analyzing genes individually, it is common practice to summarize sets of related genes using gene set variation analysis (GSVA). In short, GSVA summarizes a set of genes into a single score bounded between -1 and 1, where negative values suggest downregulation and positive values suggest upregulation. Although this interpretation is simple in theory, it depends on unbiased estimation of individual gene distributions. In the current version of GSVA, gene distributions are estimated using the input dataset (i.e., the scores are calculated based on the gene distributions from the same dataset). This becomes a major issue when study data does not adequately represent the full distribution of the population. For example, if RNA-seq data was collected on an imbalanced sample (e.g., more disease samples than healthy controls), it would be difficult to discern abnormalities in pathway activity since the gene distributions were estimated on a biased population. Therefore, we propose reference stabilizing GSVA (rsGSVA), a solution to this commonly ignored limitation by using reference datasets to estimate the gene distributions for a more stable GSVA score.</p><p><strong>Results: </strong>rsGSVA shows comparable power to classic GSVA, singscore, and ssGSEA under ideal settings while demonstrating stable scores on sample subsets. An application on irritable bowel disease highlights interpretational advantages of rsGSVA to other methods in up/down regulation of inflammation signatures.</p><p><strong>Conclusions: </strong>The rsGSVA technique enhances the GSVA functionality by incorporating a reference dataset. This integration of a reference dataset makes the enrichment scores independent of the input distribution and ensures their stability and reproducibility, even as samples are added or removed.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"26 1","pages":"596"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211894/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-025-11769-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Biological pathways are sets of genes that jointly drive biological processes. Rather than analyzing genes individually, it is common practice to summarize sets of related genes using gene set variation analysis (GSVA). In short, GSVA summarizes a set of genes into a single score bounded between -1 and 1, where negative values suggest downregulation and positive values suggest upregulation. Although this interpretation is simple in theory, it depends on unbiased estimation of individual gene distributions. In the current version of GSVA, gene distributions are estimated using the input dataset (i.e., the scores are calculated based on the gene distributions from the same dataset). This becomes a major issue when study data does not adequately represent the full distribution of the population. For example, if RNA-seq data was collected on an imbalanced sample (e.g., more disease samples than healthy controls), it would be difficult to discern abnormalities in pathway activity since the gene distributions were estimated on a biased population. Therefore, we propose reference stabilizing GSVA (rsGSVA), a solution to this commonly ignored limitation by using reference datasets to estimate the gene distributions for a more stable GSVA score.

Results: rsGSVA shows comparable power to classic GSVA, singscore, and ssGSEA under ideal settings while demonstrating stable scores on sample subsets. An application on irritable bowel disease highlights interpretational advantages of rsGSVA to other methods in up/down regulation of inflammation signatures.

Conclusions: The rsGSVA technique enhances the GSVA functionality by incorporating a reference dataset. This integration of a reference dataset makes the enrichment scores independent of the input distribution and ensures their stability and reproducibility, even as samples are added or removed.

扩展基因集变异分析与参考数据集,以稳定分数。
背景:生物通路是一组共同驱动生物过程的基因。通常的做法是使用基因集变异分析(GSVA)来总结相关基因集,而不是单独分析基因。简而言之,GSVA将一组基因汇总成一个介于-1到1之间的分数,负值表示下调,正值表示上调。虽然这种解释在理论上很简单,但它依赖于对个体基因分布的无偏估计。在当前版本的GSVA中,基因分布是使用输入数据集估计的(即,分数是基于来自同一数据集的基因分布计算的)。当研究数据不能充分代表人口的全部分布时,这就成为一个主要问题。例如,如果在一个不平衡的样本上收集RNA-seq数据(例如,疾病样本比健康对照多),由于基因分布是在有偏差的群体上估计的,因此很难辨别途径活性的异常。因此,我们提出了参考稳定GSVA (rsGSVA),通过使用参考数据集来估计基因分布以获得更稳定的GSVA评分,从而解决了这一通常被忽视的限制。结果:rsGSVA在理想设置下表现出与经典GSVA、singscore和ssGSEA相当的能力,同时在样本子集上表现出稳定的分数。在肠易激病中的应用突出了rsGSVA在上下调节炎症特征方面的解释优势。结论:rsGSVA技术通过纳入参考数据集增强了GSVA的功能。参考数据集的这种集成使得富集分数独立于输入分布,并确保了它们的稳定性和可重复性,即使是添加或删除样本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Genomics
BMC Genomics 生物-生物工程与应用微生物
CiteScore
7.40
自引率
4.50%
发文量
769
审稿时长
6.4 months
期刊介绍: BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信