gdGSE:一种通过离散基因表达值来评估途径富集的算法。

IF 4.4 2区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Computational and structural biotechnology journal Pub Date : 2025-05-01 eCollection Date: 2025-01-01 DOI:10.1016/j.csbj.2025.04.038
Jiangti Luo, Qiqi Lu, Mengjiao He, Xiaobo Zhang, Xiang Yang, Xiaosheng Wang
{"title":"gdGSE:一种通过离散基因表达值来评估途径富集的算法。","authors":"Jiangti Luo, Qiqi Lu, Mengjiao He, Xiaobo Zhang, Xiang Yang, Xiaosheng Wang","doi":"10.1016/j.csbj.2025.04.038","DOIUrl":null,"url":null,"abstract":"<p><p>We proposed gdGSE, a novel computational framework for gene set enrichment analysis. Unlike conventional methods that rely on continuous gene expression values, gdGSE employs discretized gene expression profiles to assess pathway activity. This approach effectively mitigates discrepancies caused by data distributions. This algorithm consists of two steps: (1) applying statistical thresholds binarizing gene expression matrix, and (2) converting the binarized gene expression matrix into a gene set enrichment matrix. Our results demonstrated that gdGSE could robustly extract biological insights from a diverse array of simulated and real bulk or single-cell gene expression datasets. Notably, gene set enrichment scores by gdGSE exhibited enhanced utility in downstream applications: (1) precise quantification of cancer stemness with significant prognostic relevance; (2) enhanced clustering performance in stratifying tumor subtypes with distinct prognoses; and (3) more accurate identification of cell types. Remarkably, the pathway activity scores by gdGSE showed > 90 % concordance with experimentally validated drug mechanisms in patients-derived xenografts and estrogen receptor-positive breast cancer cell lines. Our algorithm proposes that discretizing gene expression values provides an alternative method for evaluating pathway enrichment, applicable to both bulk and single-cell data analysis.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"1772-1783"},"PeriodicalIF":4.4000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127574/pdf/","citationCount":"0","resultStr":"{\"title\":\"gdGSE: An algorithm to evaluate pathway enrichment by discretizing gene expression values.\",\"authors\":\"Jiangti Luo, Qiqi Lu, Mengjiao He, Xiaobo Zhang, Xiang Yang, Xiaosheng Wang\",\"doi\":\"10.1016/j.csbj.2025.04.038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We proposed gdGSE, a novel computational framework for gene set enrichment analysis. Unlike conventional methods that rely on continuous gene expression values, gdGSE employs discretized gene expression profiles to assess pathway activity. This approach effectively mitigates discrepancies caused by data distributions. This algorithm consists of two steps: (1) applying statistical thresholds binarizing gene expression matrix, and (2) converting the binarized gene expression matrix into a gene set enrichment matrix. Our results demonstrated that gdGSE could robustly extract biological insights from a diverse array of simulated and real bulk or single-cell gene expression datasets. Notably, gene set enrichment scores by gdGSE exhibited enhanced utility in downstream applications: (1) precise quantification of cancer stemness with significant prognostic relevance; (2) enhanced clustering performance in stratifying tumor subtypes with distinct prognoses; and (3) more accurate identification of cell types. Remarkably, the pathway activity scores by gdGSE showed > 90 % concordance with experimentally validated drug mechanisms in patients-derived xenografts and estrogen receptor-positive breast cancer cell lines. Our algorithm proposes that discretizing gene expression values provides an alternative method for evaluating pathway enrichment, applicable to both bulk and single-cell data analysis.</p>\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"1772-1783\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127574/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.04.038\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.04.038","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了一种新的基因集富集分析计算框架gdGSE。与依赖连续基因表达值的传统方法不同,gdGSE采用离散的基因表达谱来评估途径活性。这种方法有效地减轻了由数据分布引起的差异。该算法包括两个步骤:(1)应用统计阈值对基因表达矩阵进行二值化;(2)将二值化后的基因表达矩阵转换为基因集富集矩阵。我们的研究结果表明,gdGSE可以从各种模拟和真实的大块或单细胞基因表达数据集中可靠地提取生物学见解。值得注意的是,gdGSE的基因集富集评分在下游应用中表现出更强的实用性:(1)精确量化具有显著预后相关性的癌症干细胞;(2)增强了对预后不同的肿瘤亚型进行分层的聚类性能;(3)更准确地识别细胞类型。值得注意的是,在患者来源的异种移植物和雌激素受体阳性的乳腺癌细胞系中,gdGSE的途径活性评分与实验验证的药物机制显示> 90 %的一致性。我们的算法提出,离散基因表达值提供了一种评估途径富集的替代方法,适用于批量和单细胞数据分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
gdGSE: An algorithm to evaluate pathway enrichment by discretizing gene expression values.

We proposed gdGSE, a novel computational framework for gene set enrichment analysis. Unlike conventional methods that rely on continuous gene expression values, gdGSE employs discretized gene expression profiles to assess pathway activity. This approach effectively mitigates discrepancies caused by data distributions. This algorithm consists of two steps: (1) applying statistical thresholds binarizing gene expression matrix, and (2) converting the binarized gene expression matrix into a gene set enrichment matrix. Our results demonstrated that gdGSE could robustly extract biological insights from a diverse array of simulated and real bulk or single-cell gene expression datasets. Notably, gene set enrichment scores by gdGSE exhibited enhanced utility in downstream applications: (1) precise quantification of cancer stemness with significant prognostic relevance; (2) enhanced clustering performance in stratifying tumor subtypes with distinct prognoses; and (3) more accurate identification of cell types. Remarkably, the pathway activity scores by gdGSE showed > 90 % concordance with experimentally validated drug mechanisms in patients-derived xenografts and estrogen receptor-positive breast cancer cell lines. Our algorithm proposes that discretizing gene expression values provides an alternative method for evaluating pathway enrichment, applicable to both bulk and single-cell data analysis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational and structural biotechnology journal
Computational and structural biotechnology journal Biochemistry, Genetics and Molecular Biology-Biophysics
CiteScore
9.30
自引率
3.30%
发文量
540
审稿时长
6 weeks
期刊介绍: Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信