Sheep pan-genome retrieves the lost sequences and genes during domestication and selection

IF 3.4 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Jiaxin Liu , Dongxin Mo , Lingyun Luo, Yilong Shi, Songsong Xu
{"title":"Sheep pan-genome retrieves the lost sequences and genes during domestication and selection","authors":"Jiaxin Liu ,&nbsp;Dongxin Mo ,&nbsp;Lingyun Luo,&nbsp;Yilong Shi,&nbsp;Songsong Xu","doi":"10.1016/j.ygeno.2025.111047","DOIUrl":null,"url":null,"abstract":"<div><div>The reference genome plays a crucial role in uncovering genomic variations, which increase our understanding of the molecular mechanisms influencing biological traits. However, most of the sheep reference genomes derive from a single individual, which couldn't adequately represent the genetic diversity of sheep. The map-to-pan strategy was used to construct the sheep pan-genome based on 801 samples with short read whole genome sequencing data including 724 domestic individuals from 151 sheep populations/breeds and 77 wild individuals from seven genus <em>Ovis</em> species, and a total of 195 Mb of nonreference sequences were assembled that absent from the <em>ARS-UI_Ramb_v2.0</em> reference. MAKER2 pipeline, integrating ab initio gene prediction, RNA-Seq, and protein homology was used to annotate the nonreference sequences. As a result, a total of additional 2678 genes were predicted in the nonreference sequences. We also identified 13,317 novel single nucleotide polymorphisms (SNPs) by mapping the sequences that could not be aligned to <em>ARS1-UI_Ramb_v2.0</em> to the nonreference sequences. Population genetic analysis, including principal component analysis (PCA), phylogenetic tree, and ADMIXTURE based on the novel SNPs revealed a clear phylogenetic relationship of the world's domestic sheep, as well as their close wild relatives. Additionally, pangenome-wide presence and absence variations (PAVs) analysis exhibited a decreasing trend in gene number from wildto domestic populations. Several genes, including <em>GZMH</em>, <em>NFE2L3</em>, <em>GPR146</em> and <em>CALHM6</em> with significant changes of presence frequencies during the evolutionary history of sheep were identified by PAV selection analysis. Functional annotation revealed that these genes were primarily associated with immune responses. Our results highlight the implications of the sheep pan-genome in identifying previously unknown genetic variations.These findings broaden our knowledge about the genetic diversity in sheep genomes, and provide insight into the domestication and breeding history of sheep.</div></div>","PeriodicalId":12521,"journal":{"name":"Genomics","volume":"117 3","pages":"Article 111047"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888754325000631","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The reference genome plays a crucial role in uncovering genomic variations, which increase our understanding of the molecular mechanisms influencing biological traits. However, most of the sheep reference genomes derive from a single individual, which couldn't adequately represent the genetic diversity of sheep. The map-to-pan strategy was used to construct the sheep pan-genome based on 801 samples with short read whole genome sequencing data including 724 domestic individuals from 151 sheep populations/breeds and 77 wild individuals from seven genus Ovis species, and a total of 195 Mb of nonreference sequences were assembled that absent from the ARS-UI_Ramb_v2.0 reference. MAKER2 pipeline, integrating ab initio gene prediction, RNA-Seq, and protein homology was used to annotate the nonreference sequences. As a result, a total of additional 2678 genes were predicted in the nonreference sequences. We also identified 13,317 novel single nucleotide polymorphisms (SNPs) by mapping the sequences that could not be aligned to ARS1-UI_Ramb_v2.0 to the nonreference sequences. Population genetic analysis, including principal component analysis (PCA), phylogenetic tree, and ADMIXTURE based on the novel SNPs revealed a clear phylogenetic relationship of the world's domestic sheep, as well as their close wild relatives. Additionally, pangenome-wide presence and absence variations (PAVs) analysis exhibited a decreasing trend in gene number from wildto domestic populations. Several genes, including GZMH, NFE2L3, GPR146 and CALHM6 with significant changes of presence frequencies during the evolutionary history of sheep were identified by PAV selection analysis. Functional annotation revealed that these genes were primarily associated with immune responses. Our results highlight the implications of the sheep pan-genome in identifying previously unknown genetic variations.These findings broaden our knowledge about the genetic diversity in sheep genomes, and provide insight into the domestication and breeding history of sheep.
绵羊泛基因组恢复了驯化和选择过程中丢失的序列和基因
参考基因组在揭示基因组变异方面起着至关重要的作用,这增加了我们对影响生物性状的分子机制的理解。然而,绵羊的参考基因组大多来自单个个体,不能充分代表绵羊的遗传多样性。利用短读全基因组测序数据(包括151个绵羊种群/品种的724只家养个体和7个羊属种的77只野生个体)共801份样本,采用map-to-pan策略构建了绵羊泛基因组,共获得了ARS-UI_Ramb_v2.0参考文献中缺失的195 Mb非参考序列。MAKER2管道,整合从头开始基因预测,RNA-Seq和蛋白质同源性,用于注释非参考序列。结果,在非参考序列中共预测了2678个基因。我们还通过将不能与ARS1-UI_Ramb_v2.0比对的序列映射到非参考序列,鉴定出13317个新的单核苷酸多态性(snp)。群体遗传分析,包括主成分分析(PCA)、系统发育树和基于新snp的admix,揭示了世界家羊及其近缘野生近缘羊的系统发育关系。此外,全基因组存在和缺失变异(pas)分析显示,野生种群到家养种群的基因数量呈下降趋势。通过PAV选择分析,鉴定出在绵羊进化史上存在频率发生显著变化的基因,包括GZMH、NFE2L3、GPR146和CALHM6。功能注释显示这些基因主要与免疫应答相关。我们的结果强调了绵羊泛基因组在识别以前未知的遗传变异方面的意义。这些发现拓宽了我们对绵羊基因组遗传多样性的认识,并为羊的驯化和繁殖历史提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genomics
Genomics 生物-生物工程与应用微生物
CiteScore
9.60
自引率
2.30%
发文量
260
审稿时长
60 days
期刊介绍: Genomics is a forum for describing the development of genome-scale technologies and their application to all areas of biological investigation. As a journal that has evolved with the field that carries its name, Genomics focuses on the development and application of cutting-edge methods, addressing fundamental questions with potential interest to a wide audience. Our aim is to publish the highest quality research and to provide authors with rapid, fair and accurate review and publication of manuscripts falling within our scope.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信