mtDNA "nomenclutter" and its consequences on the interpretation of genetic data.

IF 2.3 Q2 ECOLOGY

BMC ecology and evolution Pub Date : 2024-08-19 DOI:10.1186/s12862-024-02288-1

Vladimir Bajić, Vanessa Hava Schulmann, Katja Nowick

{"title":"mtDNA \"nomenclutter\" and its consequences on the interpretation of genetic data.","authors":"Vladimir Bajić, Vanessa Hava Schulmann, Katja Nowick","doi":"10.1186/s12862-024-02288-1","DOIUrl":null,"url":null,"abstract":"<p><p>Population-based studies of human mitochondrial genetic diversity often require the classification of mitochondrial DNA (mtDNA) haplotypes into more than 5400 described haplogroups, and further grouping those into hierarchically higher haplogroups. Such secondary haplogroup groupings (e.g., \"macro-haplogroups\") vary across studies, as they depend on the sample quality, technical factors of haplogroup calling, the aims of the study, and the researchers' understanding of the mtDNA haplogroup nomenclature. Retention of historical nomenclature coupled with a growing number of newly described mtDNA lineages results in increasingly complex and inconsistent nomenclature that does not reflect phylogeny well. This \"clutter\" leaves room for grouping errors and inconsistencies across scientific publications, especially when the haplogroup names are used as a proxy for secondary groupings, and represents a source for scientific misinterpretation. Here we explore the effects of phylogenetically insensitive secondary mtDNA haplogroup groupings, and the lack of standardized secondary haplogroup groupings on downstream analyses and interpretation of genetic data. We demonstrate that frequency-based analyses produce inconsistent results when different secondary mtDNA groupings are applied, and thus allow for vastly different interpretations of the same genetic data. The lack of guidelines and recommendations on how to choose appropriate secondary haplogroup groupings presents an issue for the interpretation of results, as well as their comparison and reproducibility across studies. To reduce biases originating from arbitrarily defined secondary nomenclature-based groupings, we suggest that future updates of mtDNA phylogenies aimed for the use in mtDNA haplogroup nomenclature should also provide well-defined and standardized sets of phylogenetically meaningful algorithm-based secondary haplogroup groupings such as \"macro-haplogroups\", \"meso-haplogroups\", and \"micro-haplogroups\". Ideally, each of the secondary haplogroup grouping levels should be informative about different human population history events. Those phylogenetically informative levels of haplogroup groupings can be easily defined using TreeCluster, and then implemented into haplogroup callers such as HaploGrep3. This would foster reproducibility across studies, provide a grouping standard for population-based studies, and reduce errors associated with haplogroup nomenclatures in future studies.</p>","PeriodicalId":93910,"journal":{"name":"BMC ecology and evolution","volume":"24 1","pages":"110"},"PeriodicalIF":2.3000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11331612/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC ecology and evolution","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12862-024-02288-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Population-based studies of human mitochondrial genetic diversity often require the classification of mitochondrial DNA (mtDNA) haplotypes into more than 5400 described haplogroups, and further grouping those into hierarchically higher haplogroups. Such secondary haplogroup groupings (e.g., "macro-haplogroups") vary across studies, as they depend on the sample quality, technical factors of haplogroup calling, the aims of the study, and the researchers' understanding of the mtDNA haplogroup nomenclature. Retention of historical nomenclature coupled with a growing number of newly described mtDNA lineages results in increasingly complex and inconsistent nomenclature that does not reflect phylogeny well. This "clutter" leaves room for grouping errors and inconsistencies across scientific publications, especially when the haplogroup names are used as a proxy for secondary groupings, and represents a source for scientific misinterpretation. Here we explore the effects of phylogenetically insensitive secondary mtDNA haplogroup groupings, and the lack of standardized secondary haplogroup groupings on downstream analyses and interpretation of genetic data. We demonstrate that frequency-based analyses produce inconsistent results when different secondary mtDNA groupings are applied, and thus allow for vastly different interpretations of the same genetic data. The lack of guidelines and recommendations on how to choose appropriate secondary haplogroup groupings presents an issue for the interpretation of results, as well as their comparison and reproducibility across studies. To reduce biases originating from arbitrarily defined secondary nomenclature-based groupings, we suggest that future updates of mtDNA phylogenies aimed for the use in mtDNA haplogroup nomenclature should also provide well-defined and standardized sets of phylogenetically meaningful algorithm-based secondary haplogroup groupings such as "macro-haplogroups", "meso-haplogroups", and "micro-haplogroups". Ideally, each of the secondary haplogroup grouping levels should be informative about different human population history events. Those phylogenetically informative levels of haplogroup groupings can be easily defined using TreeCluster, and then implemented into haplogroup callers such as HaploGrep3. This would foster reproducibility across studies, provide a grouping standard for population-based studies, and reduce errors associated with haplogroup nomenclatures in future studies.

Abstract Image

查看原文本刊更多论文

mtDNA "命名杂乱 "及其对基因数据解读的影响。

基于人群的人类线粒体遗传多样性研究通常需要将线粒体 DNA（mtDNA）单倍型划分为 5400 多个已描述的单倍群，并进一步将这些单倍群划分为层次更高的单倍群。这些二级单倍群分组（如 "宏单倍群"）在不同研究中各不相同，因为它们取决于样本质量、单倍群调用的技术因素、研究目的以及研究人员对 mtDNA 单倍群命名法的理解。历史命名法的保留加上越来越多新描述的 mtDNA 世系，导致命名法越来越复杂和不一致，不能很好地反映系统发育。这种 "杂乱无章 "为科学出版物中的分组错误和不一致留下了空间，尤其是当单倍群名称被用作次级分组的代表时，更是科学误读的根源。在此，我们探讨了对系统发育不敏感的二级 mtDNA 单倍群分组以及缺乏标准化二级单倍群分组对下游分析和遗传数据解读的影响。我们证明，当采用不同的次级 mtDNA 单倍群分组时，基于频率的分析会产生不一致的结果，从而对相同的遗传数据做出截然不同的解释。在如何选择适当的二级单倍群分组方面缺乏指导和建议，这对结果的解释以及不同研究之间的比较和可重复性都是一个问题。为了减少任意定义的基于二级命名法的分组所产生的偏差，我们建议，未来旨在用于 mtDNA 单倍群命名法的 mtDNA 系统发生学更新也应提供定义明确且标准化的基于算法的有系统发生学意义的二级单倍群分组，如 "宏单倍群"、"中单倍群 "和 "微单倍群"。理想情况下，每个二级单倍群分组级别都应能提供不同人类种群历史事件的信息。使用 TreeCluster 可以很容易地定义这些系统发育信息丰富的单倍群分组级别，然后将其应用到单倍群调用器（如 HaploGrep3）中。这将提高各项研究的可重复性，为基于人群的研究提供分组标准，并减少未来研究中与单倍群命名相关的错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC ecology and evolution

自引率

0.00%

发文量