Single-character insertion-deletion model preserves long indels in ancestral sequence reconstruction.

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Gholamhossein Jowkar, Jūlija Pečerska, Manuel Gil, Maria Anisimova
{"title":"Single-character insertion-deletion model preserves long indels in ancestral sequence reconstruction.","authors":"Gholamhossein Jowkar, Jūlija Pečerska, Manuel Gil, Maria Anisimova","doi":"10.1186/s12859-024-05986-1","DOIUrl":null,"url":null,"abstract":"<p><p>Insertions and deletions (indels) play a significant role in genome evolution across species. Realistic modelling of indel evolution is challenging and is still an open research question. Several attempts have been made to explicitly model multi-character (long) indels, such as TKF92, by relaxing the site independence assumption and introducing fragments. However, these methods are computationally expensive. On the other hand, the Poisson Indel Process (PIP) assumes site independence but allows one to infer single-character indels on the phylogenetic tree, distinguishing insertions from deletions. PIP's marginal likelihood computation has linear time complexity, enabling ancestral sequence reconstruction (ASR) with indels in linear time. Recently, we developed ARPIP, an ASR method using PIP, capable of inferring indel events with explicit evolutionary interpretations. Here, we investigate the effect of the single-character indel assumption on reconstructed ancestral sequences on mammalian protein orthologs and on simulated data. We show that ARPIP's ancestral estimates preserve the gap length distribution observed in the input alignment. In mammalian proteins the lengths of inserted segments appear to be substantially longer compared to deleted segments. Further, we confirm the well-established deletion bias observed in real data. To date, ARPIP is the only ancestral reconstruction method that explicitly models insertion and deletion events over time. Given a good quality input alignment, it can capture ancestral long indel events on the phylogeny.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"370"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610121/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05986-1","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Insertions and deletions (indels) play a significant role in genome evolution across species. Realistic modelling of indel evolution is challenging and is still an open research question. Several attempts have been made to explicitly model multi-character (long) indels, such as TKF92, by relaxing the site independence assumption and introducing fragments. However, these methods are computationally expensive. On the other hand, the Poisson Indel Process (PIP) assumes site independence but allows one to infer single-character indels on the phylogenetic tree, distinguishing insertions from deletions. PIP's marginal likelihood computation has linear time complexity, enabling ancestral sequence reconstruction (ASR) with indels in linear time. Recently, we developed ARPIP, an ASR method using PIP, capable of inferring indel events with explicit evolutionary interpretations. Here, we investigate the effect of the single-character indel assumption on reconstructed ancestral sequences on mammalian protein orthologs and on simulated data. We show that ARPIP's ancestral estimates preserve the gap length distribution observed in the input alignment. In mammalian proteins the lengths of inserted segments appear to be substantially longer compared to deleted segments. Further, we confirm the well-established deletion bias observed in real data. To date, ARPIP is the only ancestral reconstruction method that explicitly models insertion and deletion events over time. Given a good quality input alignment, it can capture ancestral long indel events on the phylogeny.

单字符插入-删除模型在祖先序列重建中保留了长索引。
插入和缺失(indels)在跨物种基因组进化中起着重要作用。indel进化的现实建模是具有挑战性的,仍然是一个开放的研究问题。通过放松站点独立性假设和引入片段,已经进行了多次尝试,以显式地建模多字符(长)索引,例如TKF92。然而,这些方法在计算上是昂贵的。另一方面,Poisson Indel Process (PIP)假定位点独立,但允许在系统发育树上推断单字符索引,区分插入和删除。PIP的边际似然计算具有线性时间复杂度,可以在线性时间内实现索引的祖先序列重建。最近,我们开发了ARPIP,一种使用PIP的ASR方法,能够用明确的进化解释推断indel事件。在这里,我们研究了单字符indel假设对哺乳动物蛋白质同源重构祖先序列和模拟数据的影响。我们证明ARPIP的祖先估计保留了在输入对齐中观察到的间隙长度分布。在哺乳动物蛋白质中,插入片段的长度似乎比缺失片段的长度要长得多。此外,我们证实了在实际数据中观察到的完善的删除偏差。迄今为止,ARPIP是唯一一种显式地模拟插入和删除事件的祖先重建方法。给定高质量的输入对齐,它可以捕获系统发育上的祖先长indel事件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信