EMMA:给定约束子集排列的计算多序列排列的新方法

IF 1.5 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS
Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow
{"title":"EMMA:给定约束子集排列的计算多序列排列的新方法","authors":"Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow","doi":"10.1186/s13015-023-00247-x","DOIUrl":null,"url":null,"abstract":"Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment\",\"authors\":\"Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow\",\"doi\":\"10.1186/s13015-023-00247-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.\",\"PeriodicalId\":50823,\"journal\":{\"name\":\"Algorithms for Molecular Biology\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms for Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13015-023-00247-x\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-023-00247-x","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 1

摘要

将序列添加到现有的(可能是用户提供的)比对中有多种应用,包括用新数据更新大型比对、将序列添加到用生物知识构建的约束比对中,或在序列长度异质性的情况下计算比对。虽然这是一个自然问题,但目前只有少数工具能高保真地使用这些信息。我们提出的 EMMA(使用 MAFFT--add扩展多序列对齐)可解决将一组未对齐序列添加到多序列对齐(即约束对齐)中的问题。EMMA建立在MAFFT--add的基础上,MAFFT--add也是为了将序列添加到给定的约束比对中而设计的。EMMA改进了MAFFT--add方法,使用分而治之的框架将其最精确的版本MAFFT--linsi--add扩展到多序列的约束对齐。我们的研究表明,在许多现实条件下,EMMA在将序列添加到对齐中方面比其他技术更准确,而且能以高准确度(数十万条序列)扩展到大型数据集。EMMA 可在 https://github.com/c5shen/EMMA 上获取。EMMA是一种新工具,可将序列添加到现有的排列中,具有高准确性和可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Algorithms for Molecular Biology
Algorithms for Molecular Biology 生物-生化研究方法
CiteScore
2.40
自引率
10.00%
发文量
16
审稿时长
>12 weeks
期刊介绍: Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信