用新的差异样本方差基因集测试提高数据的可解释性。

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-04-14 DOI:10.1186/s12859-025-06117-0

Yasir Rahmatallah, Galina Glazko

{"title":"用新的差异样本方差基因集测试提高数据的可解释性。","authors":"Yasir Rahmatallah, Galina Glazko","doi":"10.1186/s12859-025-06117-0","DOIUrl":null,"url":null,"abstract":"Background: Gene set analysis methods have played a major role in generating biological interpretations of omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression while methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes.Results: We used ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson-Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterized the detection power and Type I error rate of these methods in addition to two methods developed earlier using simulation results with different parameters. We applied the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to within phenotype variability and to heterogeneous differences between two compared phenotypes. Our results show that methods designed to detect differential sample variance provide meaningful biological interpretations by detecting specific hallmark gene sets associated with the two compared phenotypes as documented in available literature.Conclusions: The results of this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"103"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11998189/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improving data interpretability with new differential sample variance gene set tests.\",\"authors\":\"Yasir Rahmatallah, Galina Glazko\",\"doi\":\"10.1186/s12859-025-06117-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Gene set analysis methods have played a major role in generating biological interpretations of omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression while methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes.Results: We used ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson-Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterized the detection power and Type I error rate of these methods in addition to two methods developed earlier using simulation results with different parameters. We applied the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to within phenotype variability and to heterogeneous differences between two compared phenotypes. Our results show that methods designed to detect differential sample variance provide meaningful biological interpretations by detecting specific hallmark gene sets associated with the two compared phenotypes as documented in available literature.Conclusions: The results of this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"103\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11998189/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06117-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06117-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：基因集分析方法在产生组学数据（如基因表达数据集）的生物学解释方面发挥了重要作用。然而，大多数方法都集中在检测平均表达的同质模式变化上，而检测方差模式变化的方法仍然很少被探索。虽然有一些研究试图使用基因水平的方差分析，但这种方法仍然没有得到充分利用。在比较两种表型时，尽管基因集反映了两种表型之间有意义的生物学差异，但在一种表型下的亚群中具有明显变化的基因集被现有方法忽略了。需要多变量样本水平方差分析方法来检测这种模式变化。结果：我们使用基于最小生成树的排序方案，将Cramer-Von Mises和Anderson-Darling单变量统计推广到多元基因集分析方法中，以检测差异样本方差或平均值。除了之前开发的两种方法外，我们还使用不同参数的仿真结果对这些方法的检测能力和I型错误率进行了表征。我们将开发的方法应用于诊断为b系急性淋巴细胞白血病的强的松龙耐药和强的松龙敏感儿童的微阵列基因表达数据集，以及良性增生性息肉和潜在恶性无底蛇形腺瘤/息肉的大量rna测序基因表达数据集。这些数据集中的两种比较表型中的一种或两种具有不同的分子亚型，这些亚型有助于表型内变异性和两种比较表型之间的异质差异。我们的研究结果表明，通过检测与现有文献中记载的两种比较表型相关的特定标志基因集，设计用于检测差异样本方差的方法提供了有意义的生物学解释。结论：本研究的结果表明，当两种表型之间的生物学相关但异质变化在特定信号通路中普遍存在时，设计用于检测差异样本方差的方法在提供生物学解释方面是有用的。这些方法的软件实现可以从Bioconductor软件包GSAR中获得详细的文档。可用的方法适用于归一化矩阵形式的基因表达数据集，并可用于具有可用特征集集合的归一化矩阵形式的其他组学数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving data interpretability with new differential sample variance gene set tests.

Background: Gene set analysis methods have played a major role in generating biological interpretations of omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression while methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes.

Results: We used ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson-Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterized the detection power and Type I error rate of these methods in addition to two methods developed earlier using simulation results with different parameters. We applied the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to within phenotype variability and to heterogeneous differences between two compared phenotypes. Our results show that methods designed to detect differential sample variance provide meaningful biological interpretations by detecting specific hallmark gene sets associated with the two compared phenotypes as documented in available literature.

Conclusions: The results of this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.