用邻近约束估计基因型组织表达谱的集合覆盖问题的最优解。

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2025-07-04 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf163

Jiahong Dong, Stephen Brown, Kevin Truong

{"title":"用邻近约束估计基因型组织表达谱的集合覆盖问题的最优解。","authors":"Jiahong Dong, Stephen Brown, Kevin Truong","doi":"10.1093/bioadv/vbaf163","DOIUrl":null,"url":null,"abstract":"Motivation: Genes located in close genomic proximity tend to have more similar genotype tissue expression profiles. This suggests that expression profiles for the entire genome could be estimated using a smaller set of experimentally determined profiles from carefully selected reference genes, thereby reducing the need for extensive experimental measurements.Results: We address this challenge by mapping it as a set cover problem, aiming to identify an optimal number of gene sets that can cover the entire genome. However, traditional set cover algorithms are either slow in runtime or yield non-optimal results for large datasets. To overcome this limitation, we developed a dynamic programming algorithm that leverages the consecutive ordering of genes within vicinity sets. Our algorithm solves this vicinity set cover problem with tractable runtime while minimizing the average distance between reference genes and non-reference genes within the vicinity, thereby maximizing estimation accuracy. This algorithm can be used to reduce the number of required experiments in organisms lacking genotype tissue expression data or in new human datasets with expanded tissue sets. Lastly, our algorithm also has broader applications for set cover optimization problems in other fields.Availability and implementation: The source code along with all implementation details are available at: https://github.com/sensationTI/vicinity_set_cover.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf163"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313015/pdf/","citationCount":"0","resultStr":"{\"title\":\"Optimal solution to the set cover problem with a vicinity constraint for estimating genotype tissue expression profiles.\",\"authors\":\"Jiahong Dong, Stephen Brown, Kevin Truong\",\"doi\":\"10.1093/bioadv/vbaf163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Genes located in close genomic proximity tend to have more similar genotype tissue expression profiles. This suggests that expression profiles for the entire genome could be estimated using a smaller set of experimentally determined profiles from carefully selected reference genes, thereby reducing the need for extensive experimental measurements.Results: We address this challenge by mapping it as a set cover problem, aiming to identify an optimal number of gene sets that can cover the entire genome. However, traditional set cover algorithms are either slow in runtime or yield non-optimal results for large datasets. To overcome this limitation, we developed a dynamic programming algorithm that leverages the consecutive ordering of genes within vicinity sets. Our algorithm solves this vicinity set cover problem with tractable runtime while minimizing the average distance between reference genes and non-reference genes within the vicinity, thereby maximizing estimation accuracy. This algorithm can be used to reduce the number of required experiments in organisms lacking genotype tissue expression data or in new human datasets with expanded tissue sets. Lastly, our algorithm also has broader applications for set cover optimization problems in other fields.Availability and implementation: The source code along with all implementation details are available at: https://github.com/sensationTI/vicinity_set_cover.\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf163\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313015/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

动机：位于基因组接近的基因往往具有更相似的基因型组织表达谱。这表明，整个基因组的表达谱可以使用从精心选择的参考基因中较少的实验确定的谱集来估计，从而减少了对大量实验测量的需要。结果：我们通过将其映射为集合覆盖问题来解决这一挑战，旨在确定可以覆盖整个基因组的最佳数量的基因集。然而，对于大型数据集，传统的集覆盖算法要么运行速度慢，要么产生非最优结果。为了克服这一限制，我们开发了一种动态规划算法，利用邻近集内基因的连续排序。该算法在求解邻近集覆盖问题的同时，使邻近内内参基因与非内参基因的平均距离最小化，从而使估计精度最大化。该算法可用于减少缺乏基因型组织表达数据的生物体或具有扩展组织集的新人类数据集所需的实验次数。最后，我们的算法在其他领域的集覆盖优化问题上也有更广泛的应用。可用性和实现：源代码以及所有实现细节可在：https://github.com/sensationTI/vicinity_set_cover上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Optimal solution to the set cover problem with a vicinity constraint for estimating genotype tissue expression profiles.

查看原文本刊更多论文

Optimal solution to the set cover problem with a vicinity constraint for estimating genotype tissue expression profiles.

Motivation: Genes located in close genomic proximity tend to have more similar genotype tissue expression profiles. This suggests that expression profiles for the entire genome could be estimated using a smaller set of experimentally determined profiles from carefully selected reference genes, thereby reducing the need for extensive experimental measurements.

Results: We address this challenge by mapping it as a set cover problem, aiming to identify an optimal number of gene sets that can cover the entire genome. However, traditional set cover algorithms are either slow in runtime or yield non-optimal results for large datasets. To overcome this limitation, we developed a dynamic programming algorithm that leverages the consecutive ordering of genes within vicinity sets. Our algorithm solves this vicinity set cover problem with tractable runtime while minimizing the average distance between reference genes and non-reference genes within the vicinity, thereby maximizing estimation accuracy. This algorithm can be used to reduce the number of required experiments in organisms lacking genotype tissue expression data or in new human datasets with expanded tissue sets. Lastly, our algorithm also has broader applications for set cover optimization problems in other fields.

Availability and implementation: The source code along with all implementation details are available at: https://github.com/sensationTI/vicinity_set_cover.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量