Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time

T. Norri, Bastien Cazaux, D. Kosolobov, V. Mäkinen
{"title":"Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time","authors":"T. Norri, Bastien Cazaux, D. Kosolobov, V. Mäkinen","doi":"10.4230/LIPIcs.WABI.2018.15","DOIUrl":null,"url":null,"abstract":"Given a threshold L and a set R = {R_1, ..., R_m} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1,n] into set P of disjoint segments such that each segment [a,b] in P has length at least L and the number d(a,b)=|{R_i[a,b] : 1 <= i <= m}| of distinct substrings at segment [a,b] is minimized over [a,b] in P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b) : [a,b] in P} founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn^2). This improvement enables to exploit the algorithm on a pan-genomic setting of input strings being aligned haplotype sequences of complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling. We implemented the new algorithm and give some experimental evidence on the practicality of the approach on this pan-genomic setting.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Algorithms in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.WABI.2018.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Given a threshold L and a set R = {R_1, ..., R_m} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1,n] into set P of disjoint segments such that each segment [a,b] in P has length at least L and the number d(a,b)=|{R_i[a,b] : 1 <= i <= m}| of distinct substrings at segment [a,b] is minimized over [a,b] in P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b) : [a,b] in P} founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn^2). This improvement enables to exploit the algorithm on a pan-genomic setting of input strings being aligned haplotype sequences of complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling. We implemented the new algorithm and give some experimental evidence on the practicality of the approach on this pan-genomic setting.
线性时间泛基因组方正重构的最小分割方法
给定阈值L和集合R = {R_1,…m, R_m}字符串(单体型序列),各有长度n,创始人重建的最小分割问题是(1,n)分割成组P(分离段,每一段[a, b]至少在P长度L和维数(a, b) = | {R_i [a, b]: 1 < =我< = m} |截然不同的子段[a, b]是最小化在P [a, b]不同的子字符串的部分代表创始人块可以连接形成马克斯{d (a, b):[a,b]在表示原始R的P}方正序列中,使得交叉只发生在段边界。我们给出了一个最优的O(mn)时间算法来解决问题,改进了之前的O(mn^2)。这一改进使得该算法能够在全基因组设置的输入字符串上被利用,这些输入字符串是完整的人类染色体的单倍型序列,其目标是找到一组具有代表性的参考,这些参考可以被索引为读取比对和变体调用。我们实现了新算法,并给出了一些实验证据,证明该方法在这种泛基因组设置下的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信