Chong Zhang , Yonghang Zhang , Pengyu Li , Cong Liu , Lingyu Wang , Yani Dong , Donglei Sun , Xin Qi , Haishen Wen , Kaiqiang Zhang , Shaosen Yang , Yun Li
{"title":"Optimizing genotype imputation pipeline for low-coverage whole genome sequencing data in spotted sea bass and its application in genomic prediction","authors":"Chong Zhang , Yonghang Zhang , Pengyu Li , Cong Liu , Lingyu Wang , Yani Dong , Donglei Sun , Xin Qi , Haishen Wen , Kaiqiang Zhang , Shaosen Yang , Yun Li","doi":"10.1016/j.aqrep.2025.103088","DOIUrl":null,"url":null,"abstract":"<div><div>Genotype imputation following low-coverage whole genome sequencing (lcWGS) data offers a cost-effective approach for genotyping large populations, with significant potential to accelerate genomic selection in breeding programs. For spotted sea bass (<em>Lateolabrax maculatus</em>), genetic improvement is urgently required due to the degeneration of genetic traits and long generation intervals. However, the high costs associated with high-coverage WGS (hcWGS) for large populations have delayed breeding progress. To address this gap, the present study conducted a comprehensive evaluation of genotype imputation for lcWGS data down-sampled from 1107 individuals across four hcWGS datasets and aimed to develop an efficient imputation pipeline utilizing lcWGS data for spotted sea bass. Initially, 100data dataset was selected to preliminary assess the performance of various imputation pipelines. BEAGLE was excluded due to its lower accuracy and redundant computational requirements, while STITCH and GLIMPSE2 were retained for subsequent analyses. The effects of reference and target data on GLIMPSE2 imputation were then evaluated, identifying the optimal strategy for constructing the reference panel prioritizes population genetic diversity over sample size to maximizes imputation accuracy. It also highlighted the critical role of population structure, genetic relatedness and linkage disequilibrium (LD) level between reference and target data for imputation accuracy. Additionally, the imputation accuracy of STITCH and GLIMPSE2 was compared across three datasets, with GLIMPSE2 imputation using the optimal reference panel emerging as the most effective imputation pipeline for spotted sea bass. Finally, we demonstrated that lcWGS data combined with GLIMPSE2 imputation achieves predictive accuracy comparable to hcWGS data in genomic prediction. Our study presents an optimized workflow to impute lcWGS data in spotted sea bass and establishes the first publicly available reference panel with the highest known genetic diversity. This resource lays a crucial foundation for future genomic selection and breeding programs and serves as a valuable reference for genotype imputation in other aquaculture species.</div></div>","PeriodicalId":8103,"journal":{"name":"Aquaculture Reports","volume":"45 ","pages":"Article 103088"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquaculture Reports","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352513425004740","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0
Abstract
Genotype imputation following low-coverage whole genome sequencing (lcWGS) data offers a cost-effective approach for genotyping large populations, with significant potential to accelerate genomic selection in breeding programs. For spotted sea bass (Lateolabrax maculatus), genetic improvement is urgently required due to the degeneration of genetic traits and long generation intervals. However, the high costs associated with high-coverage WGS (hcWGS) for large populations have delayed breeding progress. To address this gap, the present study conducted a comprehensive evaluation of genotype imputation for lcWGS data down-sampled from 1107 individuals across four hcWGS datasets and aimed to develop an efficient imputation pipeline utilizing lcWGS data for spotted sea bass. Initially, 100data dataset was selected to preliminary assess the performance of various imputation pipelines. BEAGLE was excluded due to its lower accuracy and redundant computational requirements, while STITCH and GLIMPSE2 were retained for subsequent analyses. The effects of reference and target data on GLIMPSE2 imputation were then evaluated, identifying the optimal strategy for constructing the reference panel prioritizes population genetic diversity over sample size to maximizes imputation accuracy. It also highlighted the critical role of population structure, genetic relatedness and linkage disequilibrium (LD) level between reference and target data for imputation accuracy. Additionally, the imputation accuracy of STITCH and GLIMPSE2 was compared across three datasets, with GLIMPSE2 imputation using the optimal reference panel emerging as the most effective imputation pipeline for spotted sea bass. Finally, we demonstrated that lcWGS data combined with GLIMPSE2 imputation achieves predictive accuracy comparable to hcWGS data in genomic prediction. Our study presents an optimized workflow to impute lcWGS data in spotted sea bass and establishes the first publicly available reference panel with the highest known genetic diversity. This resource lays a crucial foundation for future genomic selection and breeding programs and serves as a valuable reference for genotype imputation in other aquaculture species.
Aquaculture ReportsAgricultural and Biological Sciences-Animal Science and Zoology
CiteScore
5.90
自引率
8.10%
发文量
469
审稿时长
77 days
期刊介绍:
Aquaculture Reports will publish original research papers and reviews documenting outstanding science with a regional context and focus, answering the need for high quality information on novel species, systems and regions in emerging areas of aquaculture research and development, such as integrated multi-trophic aquaculture, urban aquaculture, ornamental, unfed aquaculture, offshore aquaculture and others. Papers having industry research as priority and encompassing product development research or current industry practice are encouraged.