Introducing the Y-chromosomal Ancestral-like Reference Sequence-Improving the Capture of Human Evolutionary Information.

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution Pub Date : 2025-10-01 DOI:10.1093/molbev/msaf222

Zehra Köksal, Annina Preussner, Jaakko Leinonen, Taru Tukiainen

{"title":"Introducing the Y-chromosomal Ancestral-like Reference Sequence-Improving the Capture of Human Evolutionary Information.","authors":"Zehra Köksal, Annina Preussner, Jaakko Leinonen, Taru Tukiainen","doi":"10.1093/molbev/msaf222","DOIUrl":null,"url":null,"abstract":"<p><p>Reference sequences are essential for reproducible genetic analyses but are often chosen without regard to evolutionary relevance within the analyzed species. The human Y chromosome is widely used in evolutionary studies, yet current references represent evolutionarily young sequences, which can cause misleading variant calling. To address this issue, we constructed a Y-chromosomal ancestral-like reference sequence to improve the detection of evolutionarily informative variants on the Y chromosome. The Y-chromosomal ancestral-like reference sequence was constructed by applying a weighted maximum parsimony approach to human and primate Y chromosome sequences. To benchmark the performance of the Y-chromosomal ancestral-like reference sequence, 40 Y chromosome short-read sequences from diverse haplogroups were aligned to Y-chromosomal ancestral-like reference sequence and existing references (GRCh37, GRCh38, and T2T-CHM13). Overall, the Y-chromosomal ancestral-like reference sequence yielded the highest and most consistent number of SNPs per sample (mean = 1,400; SD = 77), while other references yielded on average fewer variants (mean = 866 to 968) and showed greater variability across samples (SD = 457 to 531) depending on their phylogenetic distance from the reference. Additionally, alignments to the Y-chromosomal ancestral-like reference sequence resulted in calling solely SNPs with evolutionarily derived alleles, while alignments to other references resulted in calling on average 46% SNPs with ancestral alleles. This study demonstrates how the existing reference sequences fail to capture the full range of evolutionary information on the Y chromosome. The Y-chromosomal ancestral-like reference sequence improves capturing evolutionary information on the Y chromosome, making it a valuable resource for various evolutionary applications, such as TMRCA estimations and phylogenetic analyses. Finally, alongside the Y-chromosomal ancestral-like reference sequence, we provide a publicly available tool, polaryzer, to annotate variants as ancestral or derived in pre-aligned Y chromosome data.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12485614/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf222","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Reference sequences are essential for reproducible genetic analyses but are often chosen without regard to evolutionary relevance within the analyzed species. The human Y chromosome is widely used in evolutionary studies, yet current references represent evolutionarily young sequences, which can cause misleading variant calling. To address this issue, we constructed a Y-chromosomal ancestral-like reference sequence to improve the detection of evolutionarily informative variants on the Y chromosome. The Y-chromosomal ancestral-like reference sequence was constructed by applying a weighted maximum parsimony approach to human and primate Y chromosome sequences. To benchmark the performance of the Y-chromosomal ancestral-like reference sequence, 40 Y chromosome short-read sequences from diverse haplogroups were aligned to Y-chromosomal ancestral-like reference sequence and existing references (GRCh37, GRCh38, and T2T-CHM13). Overall, the Y-chromosomal ancestral-like reference sequence yielded the highest and most consistent number of SNPs per sample (mean = 1,400; SD = 77), while other references yielded on average fewer variants (mean = 866 to 968) and showed greater variability across samples (SD = 457 to 531) depending on their phylogenetic distance from the reference. Additionally, alignments to the Y-chromosomal ancestral-like reference sequence resulted in calling solely SNPs with evolutionarily derived alleles, while alignments to other references resulted in calling on average 46% SNPs with ancestral alleles. This study demonstrates how the existing reference sequences fail to capture the full range of evolutionary information on the Y chromosome. The Y-chromosomal ancestral-like reference sequence improves capturing evolutionary information on the Y chromosome, making it a valuable resource for various evolutionary applications, such as TMRCA estimations and phylogenetic analyses. Finally, alongside the Y-chromosomal ancestral-like reference sequence, we provide a publicly available tool, polaryzer, to annotate variants as ancestral or derived in pre-aligned Y chromosome data.

查看原文本刊更多论文

引入y染色体祖先样参考序列-改进人类进化信息的捕获。

参考序列对于可重复的遗传分析是必不可少的，但通常不考虑被分析物种的进化相关性而选择参考序列。人类Y染色体（chrY）在进化研究中被广泛使用，但目前的参考文献代表了进化年轻的序列，这可能导致误导性的变体调用。为了解决这个问题，我们构建了一个y染色体祖先样参考序列（Y-ARS），以提高对chrY进化信息变异的检测。Y-ARS是通过对人类和灵长类动物的chrY序列应用加权最大简约法构建的。为了测试Y-ARS的性能，将来自不同单倍群的40个chrY短读序列与Y-ARS和现有参考文献（GRCh37， GRCh38和T2T-CHM13）进行比对。总体而言，Y-ARS在每个样本中产生的snp数量最高且最一致（平均=1400；SD=77），而其他参考文献平均产生的变异较少（平均=866-968），并且根据样本与参考文献的系统发育距离显示出更大的差异（SD=457-531）。此外，与Y-ARS的比对结果显示，只调用进化衍生等位基因的snp，而与其他参考文献的比对结果显示，平均46%的人调用祖先等位基因的snp。这项研究表明，现有的参考序列无法捕捉到chrY的全部进化信息。Y-ARS改进了在chrY上捕获进化信息，使其成为各种进化应用的宝贵资源，例如TMRCA估计和系统发育分析。最后，除了Y-ARS，我们还提供了一个公开可用的工具，polaryzer，用于在预对齐的chrY数据中将变体注释为祖先或派生。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular biology and evolution 生物-进化生物学

CiteScore

19.70

自引率

3.70%

发文量

257

审稿时长

1 months

期刊介绍： Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.