基于不同软件和插补策略的荷斯坦牛X染色体变异基因型插补精度

IF 4.4 1区 农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE
Tatiana C de Souza, Luis F B Pinto, Valdecy A R da Cruz, Tatiane S Chud, Victor B Pedrosa, Gerson A O Junior, Hinayah R de Oliveira, Henrique A Mulim, Filippo Miglior, Flávio S Schenkel, Luiz F Brito
{"title":"基于不同软件和插补策略的荷斯坦牛X染色体变异基因型插补精度","authors":"Tatiana C de Souza, Luis F B Pinto, Valdecy A R da Cruz, Tatiane S Chud, Victor B Pedrosa, Gerson A O Junior, Hinayah R de Oliveira, Henrique A Mulim, Filippo Miglior, Flávio S Schenkel, Luiz F Brito","doi":"10.3168/jds.2025-26715","DOIUrl":null,"url":null,"abstract":"<p><p>The X chromosome is one of the largest in the cattle genome, but little is known about the imputation of X chromosome variants. Thus, the main objective of this study was to assess the imputation accuracy of SNPs located in the X chromosome based on different strategies. Data from 2,505 Holstein cattle were used, and the imputation was carried out in 2 steps. Step 1 consisted of imputation from 5 medium-density (MD) SNP panels to a consolidated MD SNP panel, and step 2 was based on imputation from this consolidated MD SNP panel to a high-density (HD) SNP panel. Six scenarios (S1-S6) were evaluated for imputing autosomal SNPs (S1<sup>A</sup>, S2<sup>A</sup>, S3<sup>A</sup>, S4<sup>A</sup>, S5<sup>A</sup>, S6<sup>A</sup>), as well as the entire X chromosome (S1<sup>X</sup>, S2<sup>X</sup>, S3<sup>X</sup>, S4<sup>X</sup>, S5<sup>X</sup>, S6<sup>X</sup>) and the pseudoautosomal region (PAR; S1<sup>PAR</sup>, S2<sup>PAR</sup>, S3<sup>PAR</sup>, S4<sup>PAR</sup>, S5<sup>PAR</sup>, S6<sup>PAR</sup>) and non-PAR (S1<sup>non-PAR</sup>, S2<sup>non-PAR</sup>, S3<sup>non-PAR</sup>, S4<sup>non-PAR</sup>, S5<sup>non-PAR</sup>, S6<sup>non-PAR</sup>) segments of the X chromosome. The validation population in all these scenarios had 169 females and zero (S1, S2, and S3) or 583 (S4, S5, and S6) males, whereas the reference population had 169 (S2, S5) or 392 (S1, S3, S4, S6) females and zero (S1, S4), 196 (S2, S5), or 1,361 (S3, S6) males. Two imputation software tools (Minimac and FindHap) were compared across scenarios. Step 1 provided a consolidated MD SNP panel containing 2,132 and 63,259 SNPs located on the X and autosomal chromosomes, respectively, and step 2 resulted in an HD SNP panel with 5,921 and 294,865 SNPs located on the X and autosomal chromosomes, respectively. In step 1, the lowest average allelic correlation (R) was 0.93 (S4<sup>PAR</sup>) with Minimac and 0.79 (S4<sup>PAR</sup>) with FindHap, whereas the lowest genotypic concordance rate (CR) was 95.0 (S4<sup>PAR</sup>) with Minimac and 85.0 (S4<sup>PAR</sup>) when using FindHap. In step 2, the lowest R was 0.93 (S4<sup>PAR</sup> and S4<sup>non-PAR</sup>) with Minimac and 0.66 (S4<sup>X</sup>) with FindHap, whereas the lowest CR was 96.2 (S4<sup>PAR</sup>) with Minimac and 80.3 (S4<sup>X</sup>) with FindHap. In general, all the scenarios had high imputation accuracy of the X chromosome SNPs when using the Minimac software, whereas FindHap showed better accuracy with scenarios S3 and S6. Including both males and females in the reference and validation populations increased the imputation accuracy of X chromosome variants. These findings highlight the importance of the choice of the imputation software and the need for enlarging the reference populations to increase genotype imputation accuracy of the X chromosome variants in Holstein cattle.</p>","PeriodicalId":354,"journal":{"name":"Journal of Dairy Science","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Genotype imputation accuracy of X chromosome variants in Holstein cattle based on different software and imputation strategies.\",\"authors\":\"Tatiana C de Souza, Luis F B Pinto, Valdecy A R da Cruz, Tatiane S Chud, Victor B Pedrosa, Gerson A O Junior, Hinayah R de Oliveira, Henrique A Mulim, Filippo Miglior, Flávio S Schenkel, Luiz F Brito\",\"doi\":\"10.3168/jds.2025-26715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The X chromosome is one of the largest in the cattle genome, but little is known about the imputation of X chromosome variants. Thus, the main objective of this study was to assess the imputation accuracy of SNPs located in the X chromosome based on different strategies. Data from 2,505 Holstein cattle were used, and the imputation was carried out in 2 steps. Step 1 consisted of imputation from 5 medium-density (MD) SNP panels to a consolidated MD SNP panel, and step 2 was based on imputation from this consolidated MD SNP panel to a high-density (HD) SNP panel. Six scenarios (S1-S6) were evaluated for imputing autosomal SNPs (S1<sup>A</sup>, S2<sup>A</sup>, S3<sup>A</sup>, S4<sup>A</sup>, S5<sup>A</sup>, S6<sup>A</sup>), as well as the entire X chromosome (S1<sup>X</sup>, S2<sup>X</sup>, S3<sup>X</sup>, S4<sup>X</sup>, S5<sup>X</sup>, S6<sup>X</sup>) and the pseudoautosomal region (PAR; S1<sup>PAR</sup>, S2<sup>PAR</sup>, S3<sup>PAR</sup>, S4<sup>PAR</sup>, S5<sup>PAR</sup>, S6<sup>PAR</sup>) and non-PAR (S1<sup>non-PAR</sup>, S2<sup>non-PAR</sup>, S3<sup>non-PAR</sup>, S4<sup>non-PAR</sup>, S5<sup>non-PAR</sup>, S6<sup>non-PAR</sup>) segments of the X chromosome. The validation population in all these scenarios had 169 females and zero (S1, S2, and S3) or 583 (S4, S5, and S6) males, whereas the reference population had 169 (S2, S5) or 392 (S1, S3, S4, S6) females and zero (S1, S4), 196 (S2, S5), or 1,361 (S3, S6) males. Two imputation software tools (Minimac and FindHap) were compared across scenarios. Step 1 provided a consolidated MD SNP panel containing 2,132 and 63,259 SNPs located on the X and autosomal chromosomes, respectively, and step 2 resulted in an HD SNP panel with 5,921 and 294,865 SNPs located on the X and autosomal chromosomes, respectively. In step 1, the lowest average allelic correlation (R) was 0.93 (S4<sup>PAR</sup>) with Minimac and 0.79 (S4<sup>PAR</sup>) with FindHap, whereas the lowest genotypic concordance rate (CR) was 95.0 (S4<sup>PAR</sup>) with Minimac and 85.0 (S4<sup>PAR</sup>) when using FindHap. In step 2, the lowest R was 0.93 (S4<sup>PAR</sup> and S4<sup>non-PAR</sup>) with Minimac and 0.66 (S4<sup>X</sup>) with FindHap, whereas the lowest CR was 96.2 (S4<sup>PAR</sup>) with Minimac and 80.3 (S4<sup>X</sup>) with FindHap. In general, all the scenarios had high imputation accuracy of the X chromosome SNPs when using the Minimac software, whereas FindHap showed better accuracy with scenarios S3 and S6. Including both males and females in the reference and validation populations increased the imputation accuracy of X chromosome variants. These findings highlight the importance of the choice of the imputation software and the need for enlarging the reference populations to increase genotype imputation accuracy of the X chromosome variants in Holstein cattle.</p>\",\"PeriodicalId\":354,\"journal\":{\"name\":\"Journal of Dairy Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dairy Science\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3168/jds.2025-26715\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dairy Science","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3168/jds.2025-26715","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

X染色体是牛基因组中最大的染色体之一,但对X染色体变异的归因知之甚少。因此,本研究的主要目的是评估基于不同策略的X染色体snp的归算准确性。采用2505头荷斯坦牛的数据,分2步进行代入。步骤1包括从5个中密度(MD) SNP面板插入到一个统一的MD SNP面板,步骤2是基于从这个统一的MD SNP面板插入到一个高密度(HD) SNP面板。评估了6种情况(S1-S6)常染色体snp (S1A、S2A、S3A、S4A、S5A、S6A),以及整个X染色体(S1X、S2X、S3X、S4X、S5X、S6X)和假常染色体区域(PAR; S1PAR、S2PAR、S3PAR、S4PAR、S5PAR、S6PAR)和X染色体非PAR (S1non-PAR、S2non-PAR、S3non-PAR、S4non-PAR、S5non-PAR、S6non-PAR)片段。所有这些情景下的验证群体有169名女性和0名(S1、S2和S3)或583名(S4、S5和S6)男性,而参考群体有169名(S2、S5)或392名(S1、S3、S4、S6)女性和0名(S1、S4)、196名(S2、S5)或1361名(S3、S6)男性。在不同情况下比较了两种输入软件工具(Minimac和FindHap)。步骤1提供了一个统一的MD SNP面板,分别包含位于X和常染色体上的2,132和63,259个SNP,步骤2产生了一个HD SNP面板,分别位于X和常染色体上的5,921和294,865个SNP。在步骤1中,与Minimac的平均等位基因相关(R)最低为0.93 (S4PAR),与FindHap的平均等位基因相关(R)最低为0.79 (S4PAR),而与Minimac的最低基因型一致性(CR)为95.0 (S4PAR),与FindHap的最低基因型一致性(CR)为85.0 (S4PAR)。在步骤2中,Minimac的最低R为0.93 (S4PAR和s4非par), FindHap的最低R为0.66 (S4X),而Minimac的最低CR为96.2 (S4PAR), FindHap的最低CR为80.3 (S4X)。总体而言,使用Minimac软件对X染色体snp的推测准确率较高,而FindHap软件对情景S3和情景S6的推测准确率较高。在参考人群和验证人群中同时包含男性和女性,增加了X染色体变异的插入准确性。这些发现强调了选择插补软件的重要性,以及扩大参考群体以提高荷斯坦牛X染色体变异基因型插补准确性的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Genotype imputation accuracy of X chromosome variants in Holstein cattle based on different software and imputation strategies.

The X chromosome is one of the largest in the cattle genome, but little is known about the imputation of X chromosome variants. Thus, the main objective of this study was to assess the imputation accuracy of SNPs located in the X chromosome based on different strategies. Data from 2,505 Holstein cattle were used, and the imputation was carried out in 2 steps. Step 1 consisted of imputation from 5 medium-density (MD) SNP panels to a consolidated MD SNP panel, and step 2 was based on imputation from this consolidated MD SNP panel to a high-density (HD) SNP panel. Six scenarios (S1-S6) were evaluated for imputing autosomal SNPs (S1A, S2A, S3A, S4A, S5A, S6A), as well as the entire X chromosome (S1X, S2X, S3X, S4X, S5X, S6X) and the pseudoautosomal region (PAR; S1PAR, S2PAR, S3PAR, S4PAR, S5PAR, S6PAR) and non-PAR (S1non-PAR, S2non-PAR, S3non-PAR, S4non-PAR, S5non-PAR, S6non-PAR) segments of the X chromosome. The validation population in all these scenarios had 169 females and zero (S1, S2, and S3) or 583 (S4, S5, and S6) males, whereas the reference population had 169 (S2, S5) or 392 (S1, S3, S4, S6) females and zero (S1, S4), 196 (S2, S5), or 1,361 (S3, S6) males. Two imputation software tools (Minimac and FindHap) were compared across scenarios. Step 1 provided a consolidated MD SNP panel containing 2,132 and 63,259 SNPs located on the X and autosomal chromosomes, respectively, and step 2 resulted in an HD SNP panel with 5,921 and 294,865 SNPs located on the X and autosomal chromosomes, respectively. In step 1, the lowest average allelic correlation (R) was 0.93 (S4PAR) with Minimac and 0.79 (S4PAR) with FindHap, whereas the lowest genotypic concordance rate (CR) was 95.0 (S4PAR) with Minimac and 85.0 (S4PAR) when using FindHap. In step 2, the lowest R was 0.93 (S4PAR and S4non-PAR) with Minimac and 0.66 (S4X) with FindHap, whereas the lowest CR was 96.2 (S4PAR) with Minimac and 80.3 (S4X) with FindHap. In general, all the scenarios had high imputation accuracy of the X chromosome SNPs when using the Minimac software, whereas FindHap showed better accuracy with scenarios S3 and S6. Including both males and females in the reference and validation populations increased the imputation accuracy of X chromosome variants. These findings highlight the importance of the choice of the imputation software and the need for enlarging the reference populations to increase genotype imputation accuracy of the X chromosome variants in Holstein cattle.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Dairy Science
Journal of Dairy Science 农林科学-奶制品与动物科学
CiteScore
7.90
自引率
17.10%
发文量
784
审稿时长
4.2 months
期刊介绍: The official journal of the American Dairy Science Association®, Journal of Dairy Science® (JDS) is the leading peer-reviewed general dairy research journal in the world. JDS readers represent education, industry, and government agencies in more than 70 countries with interests in biochemistry, breeding, economics, engineering, environment, food science, genetics, microbiology, nutrition, pathology, physiology, processing, public health, quality assurance, and sanitation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信