引入y染色体祖先样参考序列-改进人类进化信息的捕获。

IF 5.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Zehra Köksal, Annina Preussner, Jaakko Leinonen, Taru Tukiainen
{"title":"引入y染色体祖先样参考序列-改进人类进化信息的捕获。","authors":"Zehra Köksal, Annina Preussner, Jaakko Leinonen, Taru Tukiainen","doi":"10.1093/molbev/msaf222","DOIUrl":null,"url":null,"abstract":"<p><p>Reference sequences are essential for reproducible genetic analyses but are often chosen without regard to evolutionary relevance within the analyzed species. The human Y chromosome is widely used in evolutionary studies, yet current references represent evolutionarily young sequences, which can cause misleading variant calling. To address this issue, we constructed a Y-chromosomal ancestral-like reference sequence to improve the detection of evolutionarily informative variants on the Y chromosome. The Y-chromosomal ancestral-like reference sequence was constructed by applying a weighted maximum parsimony approach to human and primate Y chromosome sequences. To benchmark the performance of the Y-chromosomal ancestral-like reference sequence, 40 Y chromosome short-read sequences from diverse haplogroups were aligned to Y-chromosomal ancestral-like reference sequence and existing references (GRCh37, GRCh38, and T2T-CHM13). Overall, the Y-chromosomal ancestral-like reference sequence yielded the highest and most consistent number of SNPs per sample (mean = 1,400; SD = 77), while other references yielded on average fewer variants (mean = 866 to 968) and showed greater variability across samples (SD = 457 to 531) depending on their phylogenetic distance from the reference. Additionally, alignments to the Y-chromosomal ancestral-like reference sequence resulted in calling solely SNPs with evolutionarily derived alleles, while alignments to other references resulted in calling on average 46% SNPs with ancestral alleles. This study demonstrates how the existing reference sequences fail to capture the full range of evolutionary information on the Y chromosome. The Y-chromosomal ancestral-like reference sequence improves capturing evolutionary information on the Y chromosome, making it a valuable resource for various evolutionary applications, such as TMRCA estimations and phylogenetic analyses. Finally, alongside the Y-chromosomal ancestral-like reference sequence, we provide a publicly available tool, polaryzer, to annotate variants as ancestral or derived in pre-aligned Y chromosome data.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12485614/pdf/","citationCount":"0","resultStr":"{\"title\":\"Introducing the Y-chromosomal Ancestral-like Reference Sequence-Improving the Capture of Human Evolutionary Information.\",\"authors\":\"Zehra Köksal, Annina Preussner, Jaakko Leinonen, Taru Tukiainen\",\"doi\":\"10.1093/molbev/msaf222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Reference sequences are essential for reproducible genetic analyses but are often chosen without regard to evolutionary relevance within the analyzed species. The human Y chromosome is widely used in evolutionary studies, yet current references represent evolutionarily young sequences, which can cause misleading variant calling. To address this issue, we constructed a Y-chromosomal ancestral-like reference sequence to improve the detection of evolutionarily informative variants on the Y chromosome. The Y-chromosomal ancestral-like reference sequence was constructed by applying a weighted maximum parsimony approach to human and primate Y chromosome sequences. To benchmark the performance of the Y-chromosomal ancestral-like reference sequence, 40 Y chromosome short-read sequences from diverse haplogroups were aligned to Y-chromosomal ancestral-like reference sequence and existing references (GRCh37, GRCh38, and T2T-CHM13). Overall, the Y-chromosomal ancestral-like reference sequence yielded the highest and most consistent number of SNPs per sample (mean = 1,400; SD = 77), while other references yielded on average fewer variants (mean = 866 to 968) and showed greater variability across samples (SD = 457 to 531) depending on their phylogenetic distance from the reference. Additionally, alignments to the Y-chromosomal ancestral-like reference sequence resulted in calling solely SNPs with evolutionarily derived alleles, while alignments to other references resulted in calling on average 46% SNPs with ancestral alleles. This study demonstrates how the existing reference sequences fail to capture the full range of evolutionary information on the Y chromosome. The Y-chromosomal ancestral-like reference sequence improves capturing evolutionary information on the Y chromosome, making it a valuable resource for various evolutionary applications, such as TMRCA estimations and phylogenetic analyses. Finally, alongside the Y-chromosomal ancestral-like reference sequence, we provide a publicly available tool, polaryzer, to annotate variants as ancestral or derived in pre-aligned Y chromosome data.</p>\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12485614/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf222\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf222","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

参考序列对于可重复的遗传分析是必不可少的,但通常不考虑被分析物种的进化相关性而选择参考序列。人类Y染色体(chrY)在进化研究中被广泛使用,但目前的参考文献代表了进化年轻的序列,这可能导致误导性的变体调用。为了解决这个问题,我们构建了一个y染色体祖先样参考序列(Y-ARS),以提高对chrY进化信息变异的检测。Y-ARS是通过对人类和灵长类动物的chrY序列应用加权最大简约法构建的。为了测试Y-ARS的性能,将来自不同单倍群的40个chrY短读序列与Y-ARS和现有参考文献(GRCh37, GRCh38和T2T-CHM13)进行比对。总体而言,Y-ARS在每个样本中产生的snp数量最高且最一致(平均=1400;SD=77),而其他参考文献平均产生的变异较少(平均=866-968),并且根据样本与参考文献的系统发育距离显示出更大的差异(SD=457-531)。此外,与Y-ARS的比对结果显示,只调用进化衍生等位基因的snp,而与其他参考文献的比对结果显示,平均46%的人调用祖先等位基因的snp。这项研究表明,现有的参考序列无法捕捉到chrY的全部进化信息。Y-ARS改进了在chrY上捕获进化信息,使其成为各种进化应用的宝贵资源,例如TMRCA估计和系统发育分析。最后,除了Y-ARS,我们还提供了一个公开可用的工具,polaryzer,用于在预对齐的chrY数据中将变体注释为祖先或派生。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Introducing the Y-chromosomal Ancestral-like Reference Sequence-Improving the Capture of Human Evolutionary Information.

Reference sequences are essential for reproducible genetic analyses but are often chosen without regard to evolutionary relevance within the analyzed species. The human Y chromosome is widely used in evolutionary studies, yet current references represent evolutionarily young sequences, which can cause misleading variant calling. To address this issue, we constructed a Y-chromosomal ancestral-like reference sequence to improve the detection of evolutionarily informative variants on the Y chromosome. The Y-chromosomal ancestral-like reference sequence was constructed by applying a weighted maximum parsimony approach to human and primate Y chromosome sequences. To benchmark the performance of the Y-chromosomal ancestral-like reference sequence, 40 Y chromosome short-read sequences from diverse haplogroups were aligned to Y-chromosomal ancestral-like reference sequence and existing references (GRCh37, GRCh38, and T2T-CHM13). Overall, the Y-chromosomal ancestral-like reference sequence yielded the highest and most consistent number of SNPs per sample (mean = 1,400; SD = 77), while other references yielded on average fewer variants (mean = 866 to 968) and showed greater variability across samples (SD = 457 to 531) depending on their phylogenetic distance from the reference. Additionally, alignments to the Y-chromosomal ancestral-like reference sequence resulted in calling solely SNPs with evolutionarily derived alleles, while alignments to other references resulted in calling on average 46% SNPs with ancestral alleles. This study demonstrates how the existing reference sequences fail to capture the full range of evolutionary information on the Y chromosome. The Y-chromosomal ancestral-like reference sequence improves capturing evolutionary information on the Y chromosome, making it a valuable resource for various evolutionary applications, such as TMRCA estimations and phylogenetic analyses. Finally, alongside the Y-chromosomal ancestral-like reference sequence, we provide a publicly available tool, polaryzer, to annotate variants as ancestral or derived in pre-aligned Y chromosome data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular biology and evolution
Molecular biology and evolution 生物-进化生物学
CiteScore
19.70
自引率
3.70%
发文量
257
审稿时长
1 months
期刊介绍: Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信