{"title":"用邻近约束估计基因型组织表达谱的集合覆盖问题的最优解。","authors":"Jiahong Dong, Stephen Brown, Kevin Truong","doi":"10.1093/bioadv/vbaf163","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Genes located in close genomic proximity tend to have more similar genotype tissue expression profiles. This suggests that expression profiles for the entire genome could be estimated using a smaller set of experimentally determined profiles from carefully selected reference genes, thereby reducing the need for extensive experimental measurements.</p><p><strong>Results: </strong>We address this challenge by mapping it as a set cover problem, aiming to identify an optimal number of gene sets that can cover the entire genome. However, traditional set cover algorithms are either slow in runtime or yield non-optimal results for large datasets. To overcome this limitation, we developed a dynamic programming algorithm that leverages the consecutive ordering of genes within vicinity sets. Our algorithm solves this vicinity set cover problem with tractable runtime while minimizing the average distance between reference genes and non-reference genes within the vicinity, thereby maximizing estimation accuracy. This algorithm can be used to reduce the number of required experiments in organisms lacking genotype tissue expression data or in new human datasets with expanded tissue sets. Lastly, our algorithm also has broader applications for set cover optimization problems in other fields.</p><p><strong>Availability and implementation: </strong>The source code along with all implementation details are available at: https://github.com/sensationTI/vicinity_set_cover.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf163"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313015/pdf/","citationCount":"0","resultStr":"{\"title\":\"Optimal solution to the set cover problem with a vicinity constraint for estimating genotype tissue expression profiles.\",\"authors\":\"Jiahong Dong, Stephen Brown, Kevin Truong\",\"doi\":\"10.1093/bioadv/vbaf163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Genes located in close genomic proximity tend to have more similar genotype tissue expression profiles. This suggests that expression profiles for the entire genome could be estimated using a smaller set of experimentally determined profiles from carefully selected reference genes, thereby reducing the need for extensive experimental measurements.</p><p><strong>Results: </strong>We address this challenge by mapping it as a set cover problem, aiming to identify an optimal number of gene sets that can cover the entire genome. However, traditional set cover algorithms are either slow in runtime or yield non-optimal results for large datasets. To overcome this limitation, we developed a dynamic programming algorithm that leverages the consecutive ordering of genes within vicinity sets. Our algorithm solves this vicinity set cover problem with tractable runtime while minimizing the average distance between reference genes and non-reference genes within the vicinity, thereby maximizing estimation accuracy. This algorithm can be used to reduce the number of required experiments in organisms lacking genotype tissue expression data or in new human datasets with expanded tissue sets. Lastly, our algorithm also has broader applications for set cover optimization problems in other fields.</p><p><strong>Availability and implementation: </strong>The source code along with all implementation details are available at: https://github.com/sensationTI/vicinity_set_cover.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf163\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313015/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Optimal solution to the set cover problem with a vicinity constraint for estimating genotype tissue expression profiles.
Motivation: Genes located in close genomic proximity tend to have more similar genotype tissue expression profiles. This suggests that expression profiles for the entire genome could be estimated using a smaller set of experimentally determined profiles from carefully selected reference genes, thereby reducing the need for extensive experimental measurements.
Results: We address this challenge by mapping it as a set cover problem, aiming to identify an optimal number of gene sets that can cover the entire genome. However, traditional set cover algorithms are either slow in runtime or yield non-optimal results for large datasets. To overcome this limitation, we developed a dynamic programming algorithm that leverages the consecutive ordering of genes within vicinity sets. Our algorithm solves this vicinity set cover problem with tractable runtime while minimizing the average distance between reference genes and non-reference genes within the vicinity, thereby maximizing estimation accuracy. This algorithm can be used to reduce the number of required experiments in organisms lacking genotype tissue expression data or in new human datasets with expanded tissue sets. Lastly, our algorithm also has broader applications for set cover optimization problems in other fields.
Availability and implementation: The source code along with all implementation details are available at: https://github.com/sensationTI/vicinity_set_cover.