Broňa Brejová, Travis Gagie, Eva Herencsárová, Tomáš Vinař
{"title":"恒定树宽的盘根图上的最大得分路径集。","authors":"Broňa Brejová, Travis Gagie, Eva Herencsárová, Tomáš Vinař","doi":"10.3389/fbinf.2024.1391086","DOIUrl":null,"url":null,"abstract":"<p><p>We generalize a problem of finding maximum-scoring segment sets, previously studied by Csűrös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150), from sequences to graphs. Namely, given a vertex-weighted graph <i>G</i> and a non-negative startup penalty <i>c</i>, we can find a set of vertex-disjoint paths in <i>G</i> with maximum total score when each path's score is its vertices' total weight minus <i>c</i>. We call this new problem <i>maximum-scoring path sets</i> (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1391086"},"PeriodicalIF":2.8000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11246863/pdf/","citationCount":"0","resultStr":"{\"title\":\"Maximum-scoring path sets on pangenome graphs of constant treewidth.\",\"authors\":\"Broňa Brejová, Travis Gagie, Eva Herencsárová, Tomáš Vinař\",\"doi\":\"10.3389/fbinf.2024.1391086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We generalize a problem of finding maximum-scoring segment sets, previously studied by Csűrös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150), from sequences to graphs. Namely, given a vertex-weighted graph <i>G</i> and a non-negative startup penalty <i>c</i>, we can find a set of vertex-disjoint paths in <i>G</i> with maximum total score when each path's score is its vertices' total weight minus <i>c</i>. We call this new problem <i>maximum-scoring path sets</i> (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.</p>\",\"PeriodicalId\":73066,\"journal\":{\"name\":\"Frontiers in bioinformatics\",\"volume\":\"4 \",\"pages\":\"1391086\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11246863/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fbinf.2024.1391086\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2024.1391086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
我们将 Csűrös(IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150)之前研究的寻找最大得分段集问题从序列推广到图。也就是说,给定一个顶点加权图 G 和一个非负的启动惩罚 c,我们可以在 G 中找到一组顶点相交的路径,当每条路径的得分是其顶点的总权重减去 c 时,总得分最大。我们提出的算法对于树宽恒定的图具有线性时间复杂度。从序列到图的泛化使该算法可用于代表多个相关基因组的庞基因组图,并可被视为庞基因组上多个生物学问题的通用抽象,包括 CpG 岛搜索、ChIP-seq 数据分析、功能元素区域富集分析或简单的链问题。
Maximum-scoring path sets on pangenome graphs of constant treewidth.
We generalize a problem of finding maximum-scoring segment sets, previously studied by Csűrös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150), from sequences to graphs. Namely, given a vertex-weighted graph G and a non-negative startup penalty c, we can find a set of vertex-disjoint paths in G with maximum total score when each path's score is its vertices' total weight minus c. We call this new problem maximum-scoring path sets (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.