Identifying low-density, ancestry-informative SNP markers through whole genome resequencing in Indian, Chinese, and wild yak.

IF 3.5 2区生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

BMC Genomics Pub Date : 2024-11-05 DOI:10.1186/s12864-024-10924-9

Munish Gangwar, Sheikh Firdous Ahmad, Abdul Basit Ali, Amit Kumar, Amod Kumar, Gyanendra Kumar Gaur, Triveni Dutt

{"title":"Identifying low-density, ancestry-informative SNP markers through whole genome resequencing in Indian, Chinese, and wild yak.","authors":"Munish Gangwar, Sheikh Firdous Ahmad, Abdul Basit Ali, Amit Kumar, Amod Kumar, Gyanendra Kumar Gaur, Triveni Dutt","doi":"10.1186/s12864-024-10924-9","DOIUrl":null,"url":null,"abstract":"The current investigation was undertaken to elucidate the population-stratifying and ancestry-informative markers in Indian, Chinese, and wild yak populations using whole genome resequencing (WGS) analysis while employing various selection strategies (Delta, Pairwise Wright's Fixation Index-FST, and Informativeness of Assignment) and marker densities (5-25 thousand). The study used WGS data on 105 individuals from three separate yak cohorts i.e., Indian yak (n = 29), Chinese yak (n = 61), and wild yak (n = 15). Variant calling in the GATK program with strict quality control resulted in 1,002,970 high-quality and independent (LD-pruned) SNP markers across the yak autosomes. Analysis was undertaken in toolbox for ranking and evaluation of SNPs (TRES) program wherein three different criteria i.e., Delta, Pairwise Wright's Fixation Index-FST, and Informativeness of Assignment were employed to identify population-stratifying and ancestry-informative markers across various datasets. The top-ranked 5,000 (5K), 10,000 (10K), 15,000 (15K), 20,000 (20K), and 25,000 (25K) SNPs were identified from each dataset while their composition and performance was assessed using different criteria. The average genomic breed clustering of Indian, Chinese, and wild yak cohorts with full density dataset (105 individuals with 1,002,970 markers) was 81.74%, 80.02%, and 83.62%, respectively. Informativeness of Assignment criterion with 10K density emerged as the best combination for three yak cohorts with 86.94%, 96.46%, and 98.20% clustering for Indian, Chinese, and wild yak, respectively. There was an average increase of 7.56%, 22.72%, and 30.35% in genomic breed clustering scores of Indian, Chinese, and wild yak cohorts over the estimates of the original dataset. The selected markers showed overlap multiple protein-coding genes within a 10 kb window including ADGRB3, ANK1, CACNG7, CALN1, CHCHD2, CREBBP, GLI3, KHDRBS2, and OSBPL10. This is the first report ever on elucidating low-density SNP marker sets with population-stratifying and ancestry-informative properties in three yak groups using WGS data. The results gain significance for application of genomic selection using cost-effective low-density SNP panels in global yak species.","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1043"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539683/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-024-10924-9","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The current investigation was undertaken to elucidate the population-stratifying and ancestry-informative markers in Indian, Chinese, and wild yak populations using whole genome resequencing (WGS) analysis while employing various selection strategies (Delta, Pairwise Wright's Fixation Index-F_ST, and Informativeness of Assignment) and marker densities (5-25 thousand). The study used WGS data on 105 individuals from three separate yak cohorts i.e., Indian yak (n = 29), Chinese yak (n = 61), and wild yak (n = 15). Variant calling in the GATK program with strict quality control resulted in 1,002,970 high-quality and independent (LD-pruned) SNP markers across the yak autosomes. Analysis was undertaken in toolbox for ranking and evaluation of SNPs (TRES) program wherein three different criteria i.e., Delta, Pairwise Wright's Fixation Index-F_ST, and Informativeness of Assignment were employed to identify population-stratifying and ancestry-informative markers across various datasets. The top-ranked 5,000 (5K), 10,000 (10K), 15,000 (15K), 20,000 (20K), and 25,000 (25K) SNPs were identified from each dataset while their composition and performance was assessed using different criteria. The average genomic breed clustering of Indian, Chinese, and wild yak cohorts with full density dataset (105 individuals with 1,002,970 markers) was 81.74%, 80.02%, and 83.62%, respectively. Informativeness of Assignment criterion with 10K density emerged as the best combination for three yak cohorts with 86.94%, 96.46%, and 98.20% clustering for Indian, Chinese, and wild yak, respectively. There was an average increase of 7.56%, 22.72%, and 30.35% in genomic breed clustering scores of Indian, Chinese, and wild yak cohorts over the estimates of the original dataset. The selected markers showed overlap multiple protein-coding genes within a 10 kb window including ADGRB3, ANK1, CACNG7, CALN1, CHCHD2, CREBBP, GLI3, KHDRBS2, and OSBPL10. This is the first report ever on elucidating low-density SNP marker sets with population-stratifying and ancestry-informative properties in three yak groups using WGS data. The results gain significance for application of genomic selection using cost-effective low-density SNP panels in global yak species.

查看原文本刊更多论文

通过对印度牦牛、中国牦牛和野生牦牛进行全基因组重测序，确定低密度、具有祖先信息的 SNP 标记。

本研究采用全基因组重测序（WGS）分析方法，同时采用不同的选择策略（Delta、配对赖特固定指数-FST和赋值信息度）和标记密度（5-25,000），以阐明印度、中国和野生牦牛种群的种群分层和祖先信息标记。研究使用了来自三个不同牦牛群的 105 个个体的 WGS 数据，即印度牦牛（n = 29）、中国牦牛（n = 61）和野牦牛（n = 15）。在严格的质量控制下，通过 GATK 程序进行变异调用，在牦牛常染色体上获得了 1,002,970 个高质量和独立（LD-pruned）的 SNP 标记。分析是在 SNPs 排名和评估工具箱（TRES）程序中进行的，其中采用了三种不同的标准，即 Delta、配对赖特固定指数（Pairwise Wright's Fixation Index-FST）和赋值信息度（Informativeness of Assignment），以确定不同数据集中的种群分层和祖先信息标记。从每个数据集中识别出排名靠前的 5,000 (5K)、10,000 (10K)、15,000 (15K)、20,000 (20K) 和 25,000 (25K) 个 SNP，并用不同的标准评估它们的组成和性能。在全密度数据集（105 个个体，1,002,970 个标记）中，印度牦牛、中国牦牛和野牦牛队列的平均基因组品种聚类率分别为 81.74%、80.02% 和 83.62%。对于印度牦牛、中国牦牛和野牦牛的三个牦牛队列来说，分配标准与 10K 密度的信息性是最佳组合，聚类率分别为 86.94%、96.46% 和 98.20%。与原始数据集的估计值相比，印度牦牛、中国牦牛和野牦牛队列的基因组品种聚类得分分别平均提高了 7.56%、22.72% 和 30.35%。所选标记在 10 kb 窗口内显示多个蛋白编码基因重叠，包括 ADGRB3、ANK1、CACNG7、CALN1、CHCHD2、CREBBP、GLI3、KHDRBS2 和 OSBPL10。这是首次利用 WGS 数据阐明三个牦牛群中具有种群分层和祖先信息属性的低密度 SNP 标记集。这些结果对于在全球牦牛物种中使用经济有效的低密度 SNP 面板进行基因组选择具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Genomics 生物-生物工程与应用微生物

CiteScore

7.40

自引率

4.50%

发文量

769

审稿时长

6.4 months

期刊介绍： BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.