UniAligner: a parameter-free framework for fast sequence alignment

IF 32.1 1区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Andrey V. Bzikadze, Pavel A. Pevzner
{"title":"UniAligner: a parameter-free framework for fast sequence alignment","authors":"Andrey V. Bzikadze, Pavel A. Pevzner","doi":"10.1038/s41592-023-01970-4","DOIUrl":null,"url":null,"abstract":"Even though the recent advances in ‘complete genomics’ revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith–Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner—the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization. Compared to other sequences, extra-long tandem repeats, such as centromeres and immunoglobulin loci, are more difficult to align. This study presents UniAligner, a computational method for efficiently and accurately aligning extra-long tandem repeats, facilitating analysis of their variation and evolution.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"20 9","pages":"1346-1354"},"PeriodicalIF":32.1000,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Methods","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41592-023-01970-4","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Even though the recent advances in ‘complete genomics’ revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith–Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner—the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization. Compared to other sequences, extra-long tandem repeats, such as centromeres and immunoglobulin loci, are more difficult to align. This study presents UniAligner, a computational method for efficiently and accurately aligning extra-long tandem repeats, facilitating analysis of their variation and evolution.

Abstract Image

UniAligner:一个用于快速序列比对的无参数框架。
尽管“完整基因组学”的最新进展揭示了以前无法访问的基因组区域,但对着丝粒和其他超长串联重复序列(ETRs)变异的分析面临着算法挑战,因为目前还没有准确比较ETRs序列的工具。与直觉相反,经典的比对方法,如Smith-Waterman算法,未能构建ETR的生物学充分比对。我们向UniAligner提出了一种无参数的序列比对算法,该算法具有序列相关的比对评分,可自动更改任何一对比较序列。UniAlign对更可能与两个序列之间的进化关系相关的稀有子串的匹配进行优先排序。我们应用UniAlign来估计人类着丝粒的突变率,并量化着丝粒中极高的大重复和缺失率。这一高比率表明,就其结构组织而言,着丝粒可能代表了人类基因组中一些进化最快的区域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Methods
Nature Methods 生物-生化研究方法
CiteScore
58.70
自引率
1.70%
发文量
326
审稿时长
1 months
期刊介绍: Nature Methods is a monthly journal that focuses on publishing innovative methods and substantial enhancements to fundamental life sciences research techniques. Geared towards a diverse, interdisciplinary readership of researchers in academia and industry engaged in laboratory work, the journal offers new tools for research and emphasizes the immediate practical significance of the featured work. It publishes primary research papers and reviews recent technical and methodological advancements, with a particular interest in primary methods papers relevant to the biological and biomedical sciences. This includes methods rooted in chemistry with practical applications for studying biological problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信