Repeat-Rich Regions Cause False-Positive Detection of NUMTs: A Case Study in Amphibians Using an Improved Cane Toad Reference Genome.

IF 3.2 2区 生物学 Q2 EVOLUTIONARY BIOLOGY
Kelton Cheung, Lee Ann Rollins, Jillian M Hammond, Kirston Barton, James M Ferguson, Harrison J F Eyck, Richard Shine, Richard J Edwards
{"title":"Repeat-Rich Regions Cause False-Positive Detection of NUMTs: A Case Study in Amphibians Using an Improved Cane Toad Reference Genome.","authors":"Kelton Cheung, Lee Ann Rollins, Jillian M Hammond, Kirston Barton, James M Ferguson, Harrison J F Eyck, Richard Shine, Richard J Edwards","doi":"10.1093/gbe/evae246","DOIUrl":null,"url":null,"abstract":"<p><p>Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMTs) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false-positive detection of NUMTs is common when handling repeat-rich sequences, while fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10× linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n = 5,310), indicating the gene-containing regions were well assembled. We used 3 complementary methods (NUMTFinder, dinumt, and PALMER) to study the NUMT landscape of the cane toad genome. All 3 methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606642/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology and Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gbe/evae246","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMTs) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false-positive detection of NUMTs is common when handling repeat-rich sequences, while fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10× linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n = 5,310), indicating the gene-containing regions were well assembled. We used 3 complementary methods (NUMTFinder, dinumt, and PALMER) to study the NUMT landscape of the cane toad genome. All 3 methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern.

富重复区域导致 NUMTs 的假阳性检测:利用改进的蔗蟾蜍参考基因组对两栖动物进行的案例研究。
几十年来,线粒体 DNA(mtDNA)一直被广泛用于遗传学研究。线粒体来源的核 DNA(NUMT)污染可能会混淆系统发育关系和 mtDNA 异源研究。与 mtDNA 的同源性搜索被广泛用于检测核基因组中的 NUMT。然而,在处理重复序列丰富的序列时,NUMTs 的假阳性检测很常见,而基因组破碎可能导致真正的 NUMTs 丢失。在这项研究中,我们研究了不同的 NUMT 检测方法以及基因组组装质量对它们的影响。我们展示了入侵蔗蟾(Rhinella marina)的改进型核基因组组装(aRhiMar1.3),并增加了长读数Nanopore和10倍链接读数测序。最终的组装结果长度为 3.47 Gb,四足动物通用单拷贝直向同源物的比例为 91.3%(n=5,310),这表明含有基因的区域组装良好。我们使用了三种互补方法(NUMTFinder、dinumt 和 PALMER)来研究蔗蟾基因组的 NUMT 图谱。三种方法的结果一致,都显示蔗蟾基因组中的 NUMT 非常少。此外,我们还将 NUMT 检测分析扩展到其他两栖动物,并证实基因组大小与核基因组中 NUMT 数量之间的关系不大。两栖类动物重复性丰富,我们的研究表明,当使用基于同源性的检测而不使用过滤器时,在高度重复的基因组中发现的NUMT数量容易膨胀。总之,这项研究为如何在复杂基因组中稳健地识别NUMTs(当mtDNA分析的混杂效应成为关注点时)提供了一个范例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genome Biology and Evolution
Genome Biology and Evolution EVOLUTIONARY BIOLOGY-GENETICS & HEREDITY
CiteScore
5.80
自引率
6.10%
发文量
169
审稿时长
1 months
期刊介绍: About the journal Genome Biology and Evolution (GBE) publishes leading original research at the interface between evolutionary biology and genomics. Papers considered for publication report novel evolutionary findings that concern natural genome diversity, population genomics, the structure, function, organisation and expression of genomes, comparative genomics, proteomics, and environmental genomic interactions. Major evolutionary insights from the fields of computational biology, structural biology, developmental biology, and cell biology are also considered, as are theoretical advances in the field of genome evolution. GBE’s scope embraces genome-wide evolutionary investigations at all taxonomic levels and for all forms of life — within populations or across domains. Its aims are to further the understanding of genomes in their evolutionary context and further the understanding of evolution from a genome-wide perspective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信