Kelton Cheung, Lee Ann Rollins, Jillian M Hammond, Kirston Barton, James M Ferguson, Harrison J F Eyck, Richard Shine, Richard J Edwards
{"title":"富重复区域导致 NUMTs 的假阳性检测:利用改进的蔗蟾蜍参考基因组对两栖动物进行的案例研究。","authors":"Kelton Cheung, Lee Ann Rollins, Jillian M Hammond, Kirston Barton, James M Ferguson, Harrison J F Eyck, Richard Shine, Richard J Edwards","doi":"10.1093/gbe/evae246","DOIUrl":null,"url":null,"abstract":"<p><p>Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMTs) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false-positive detection of NUMTs is common when handling repeat-rich sequences, while fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10× linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n = 5,310), indicating the gene-containing regions were well assembled. We used 3 complementary methods (NUMTFinder, dinumt, and PALMER) to study the NUMT landscape of the cane toad genome. All 3 methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606642/pdf/","citationCount":"0","resultStr":"{\"title\":\"Repeat-Rich Regions Cause False-Positive Detection of NUMTs: A Case Study in Amphibians Using an Improved Cane Toad Reference Genome.\",\"authors\":\"Kelton Cheung, Lee Ann Rollins, Jillian M Hammond, Kirston Barton, James M Ferguson, Harrison J F Eyck, Richard Shine, Richard J Edwards\",\"doi\":\"10.1093/gbe/evae246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMTs) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false-positive detection of NUMTs is common when handling repeat-rich sequences, while fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10× linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n = 5,310), indicating the gene-containing regions were well assembled. We used 3 complementary methods (NUMTFinder, dinumt, and PALMER) to study the NUMT landscape of the cane toad genome. All 3 methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern.</p>\",\"PeriodicalId\":12779,\"journal\":{\"name\":\"Genome Biology and Evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606642/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Biology and Evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gbe/evae246\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology and Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gbe/evae246","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
Repeat-Rich Regions Cause False-Positive Detection of NUMTs: A Case Study in Amphibians Using an Improved Cane Toad Reference Genome.
Mitochondrial DNA (mtDNA) has been widely used in genetics research for decades. Contamination from nuclear DNA of mitochondrial origin (NUMTs) can confound studies of phylogenetic relationships and mtDNA heteroplasmy. Homology searches with mtDNA are widely used to detect NUMTs in the nuclear genome. Nevertheless, false-positive detection of NUMTs is common when handling repeat-rich sequences, while fragmented genomes might result in missing true NUMTs. In this study, we investigated different NUMT detection methods and how the quality of the genome assembly affects them. We presented an improved nuclear genome assembly (aRhiMar1.3) of the invasive cane toad (Rhinella marina) with additional long-read Nanopore and 10× linked-read sequencing. The final assembly was 3.47 Gb in length with 91.3% of tetrapod universal single-copy orthologs (n = 5,310), indicating the gene-containing regions were well assembled. We used 3 complementary methods (NUMTFinder, dinumt, and PALMER) to study the NUMT landscape of the cane toad genome. All 3 methods yielded consistent results, showing very few NUMTs in the cane toad genome. Furthermore, we expanded NUMT detection analyses to other amphibians and confirmed a weak relationship between genome size and the number of NUMTs present in the nuclear genome. Amphibians are repeat-rich, and we show that the number of NUMTs found in highly repetitive genomes is prone to inflation when using homology-based detection without filters. Together, this study provides an exemplar of how to robustly identify NUMTs in complex genomes when confounding effects on mtDNA analyses are a concern.
期刊介绍:
About the journal
Genome Biology and Evolution (GBE) publishes leading original research at the interface between evolutionary biology and genomics. Papers considered for publication report novel evolutionary findings that concern natural genome diversity, population genomics, the structure, function, organisation and expression of genomes, comparative genomics, proteomics, and environmental genomic interactions. Major evolutionary insights from the fields of computational biology, structural biology, developmental biology, and cell biology are also considered, as are theoretical advances in the field of genome evolution. GBE’s scope embraces genome-wide evolutionary investigations at all taxonomic levels and for all forms of life — within populations or across domains. Its aims are to further the understanding of genomes in their evolutionary context and further the understanding of evolution from a genome-wide perspective.