将先验网络知识整合到基因集分析中的加权重叠组套索。

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Dan Huang, Geunsu Jo, Kipoong Kim, Hokeun Sun
{"title":"将先验网络知识整合到基因集分析中的加权重叠组套索。","authors":"Dan Huang, Geunsu Jo, Kipoong Kim, Hokeun Sun","doi":"10.1186/s12859-025-06170-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Gene set analysis aims to identify gene sets containing differentially expressed genes between two different experimental conditions. A representative example of gene sets is a gene regulatory network where multiple genes are linked with each other for regulation of gene expression. Most of statistical methods for gene set analysis were designed to capture group-based association signals, ignoring a genetic network structure. Consequently, they often fail to identify gene sets where the number of differentially expressed genes are only a few and they have sparse association signals.</p><p><strong>Results: </strong>We propose a new computational method to utilize prior network knowledge for gene set analysis. The proposed method is essentially combines the coefficient estimates of network-based regularization into overlapping group lasso. Network-based regularization can boost association signals among linked genes while overlapping group lasso performs selection of gene sets including differentially expressed genes. In our extensive simulation study, the performance of the proposed method has been evaluated, compared with the existing methods. We also applied it to gene expression data of The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA). We were able to identify cancer-related pathways that were missed by the existing methods.</p><p><strong>Conclusion: </strong>Overlapping group lasso is a regularization method for group selection allowing overlapping variables. Network-based regularization is a variable selection method utilizing graph information among variables. The proposed weighted overlapping group lasso (wOGL) adopts the coefficient estimates of network-based regularization for the weight of overlapping group lasso. Consequently, it can identify gene sets containing differentially expressed genes, utilizing prior network knowledge.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"226"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403420/pdf/","citationCount":"0","resultStr":"{\"title\":\"Weighted overlapping group lasso for integrating prior network knowledge into gene set analysis.\",\"authors\":\"Dan Huang, Geunsu Jo, Kipoong Kim, Hokeun Sun\",\"doi\":\"10.1186/s12859-025-06170-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Gene set analysis aims to identify gene sets containing differentially expressed genes between two different experimental conditions. A representative example of gene sets is a gene regulatory network where multiple genes are linked with each other for regulation of gene expression. Most of statistical methods for gene set analysis were designed to capture group-based association signals, ignoring a genetic network structure. Consequently, they often fail to identify gene sets where the number of differentially expressed genes are only a few and they have sparse association signals.</p><p><strong>Results: </strong>We propose a new computational method to utilize prior network knowledge for gene set analysis. The proposed method is essentially combines the coefficient estimates of network-based regularization into overlapping group lasso. Network-based regularization can boost association signals among linked genes while overlapping group lasso performs selection of gene sets including differentially expressed genes. In our extensive simulation study, the performance of the proposed method has been evaluated, compared with the existing methods. We also applied it to gene expression data of The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA). We were able to identify cancer-related pathways that were missed by the existing methods.</p><p><strong>Conclusion: </strong>Overlapping group lasso is a regularization method for group selection allowing overlapping variables. Network-based regularization is a variable selection method utilizing graph information among variables. The proposed weighted overlapping group lasso (wOGL) adopts the coefficient estimates of network-based regularization for the weight of overlapping group lasso. Consequently, it can identify gene sets containing differentially expressed genes, utilizing prior network knowledge.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"226\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403420/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06170-9\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06170-9","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

背景:基因集分析的目的是鉴定两种不同实验条件下含有差异表达基因的基因集。基因集的一个典型例子是基因调控网络,其中多个基因相互连接以调节基因表达。大多数用于基因集分析的统计方法被设计为捕获基于群体的关联信号,而忽略了遗传网络结构。因此,在差异表达基因数量较少且关联信号稀疏的情况下,它们往往无法识别基因集。结果:提出了一种利用先验网络知识进行基因集分析的计算方法。该方法实质上是将基于网络的正则化系数估计结合到重叠组套索中。基于网络的正则化可以增强连锁基因之间的关联信号,而重叠群套索则可以选择包括差异表达基因在内的基因集。在我们广泛的仿真研究中,对所提出的方法的性能进行了评估,并与现有方法进行了比较。我们还将其应用于乳腺癌基因组图谱(TCGA-BRCA)的基因表达数据。我们能够识别出被现有方法遗漏的癌症相关途径。结论:重叠组套索是一种允许变量重叠的组选择正则化方法。基于网络的正则化是一种利用变量间的图信息进行变量选择的方法。提出的加权重叠组套索(wOGL)对重叠组套索的权重采用基于网络正则化的系数估计。因此,它可以识别包含差异表达基因的基因集,利用先前的网络知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Weighted overlapping group lasso for integrating prior network knowledge into gene set analysis.

Weighted overlapping group lasso for integrating prior network knowledge into gene set analysis.

Weighted overlapping group lasso for integrating prior network knowledge into gene set analysis.

Weighted overlapping group lasso for integrating prior network knowledge into gene set analysis.

Background: Gene set analysis aims to identify gene sets containing differentially expressed genes between two different experimental conditions. A representative example of gene sets is a gene regulatory network where multiple genes are linked with each other for regulation of gene expression. Most of statistical methods for gene set analysis were designed to capture group-based association signals, ignoring a genetic network structure. Consequently, they often fail to identify gene sets where the number of differentially expressed genes are only a few and they have sparse association signals.

Results: We propose a new computational method to utilize prior network knowledge for gene set analysis. The proposed method is essentially combines the coefficient estimates of network-based regularization into overlapping group lasso. Network-based regularization can boost association signals among linked genes while overlapping group lasso performs selection of gene sets including differentially expressed genes. In our extensive simulation study, the performance of the proposed method has been evaluated, compared with the existing methods. We also applied it to gene expression data of The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA). We were able to identify cancer-related pathways that were missed by the existing methods.

Conclusion: Overlapping group lasso is a regularization method for group selection allowing overlapping variables. Network-based regularization is a variable selection method utilizing graph information among variables. The proposed weighted overlapping group lasso (wOGL) adopts the coefficient estimates of network-based regularization for the weight of overlapping group lasso. Consequently, it can identify gene sets containing differentially expressed genes, utilizing prior network knowledge.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信