Weitao Chen, Chao Li, Rong Yang, Yuefei Li, Baosheng Wu, Jie Li
{"title":"Haplotype resolved chromosome-level genome assembly of the gold barb (Barbodes semifasciolatus).","authors":"Weitao Chen, Chao Li, Rong Yang, Yuefei Li, Baosheng Wu, Jie Li","doi":"10.1038/s41597-025-05178-3","DOIUrl":null,"url":null,"abstract":"<p><p>The gold barb (Barbodes semifasciolatus), a member of the Cyprinidae family, exhibits remarkable adaptability to highly acidic environments, making it an ideal model for studying extreme environmental adaptation. However, its genome has not been previously characterized. To address this, we assembled a high-quality chromosome-scale genome for B. semifasciolatus using High-Fidelity (HiFi) sequencing and Hi-C technology. The resulting haplotype-resolved assemblies, spanning 776 Mb and 779 Mb across 25 chromosomes, achieved genome coverages of 99.5% and 99.7%, respectively, and included four gap-free chromosomes. Genome quality assessment using BUSCO indicated a high completeness score of 98.2% for haplotype1 and 98.3% for haplotype2, further validated by strong synteny with the zebrafish (Danio rerio), confirming the assembly's integrity and continuity. Through integration of full-length transcriptome data, RNA sequencing, and homology-based annotation, we identified 26,057 protein-coding genes with 2,087 pseudogenes in haplotype 2, and 25,622 protein-coding genes with 2,101 pseudogenes in haplotype 1. This high-resolution genome assembly is a crucial resource for advancing research in the Cyprinidae, particularly for understanding adaptive evolution in extreme environments.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"902"},"PeriodicalIF":5.8000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122775/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-05178-3","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The gold barb (Barbodes semifasciolatus), a member of the Cyprinidae family, exhibits remarkable adaptability to highly acidic environments, making it an ideal model for studying extreme environmental adaptation. However, its genome has not been previously characterized. To address this, we assembled a high-quality chromosome-scale genome for B. semifasciolatus using High-Fidelity (HiFi) sequencing and Hi-C technology. The resulting haplotype-resolved assemblies, spanning 776 Mb and 779 Mb across 25 chromosomes, achieved genome coverages of 99.5% and 99.7%, respectively, and included four gap-free chromosomes. Genome quality assessment using BUSCO indicated a high completeness score of 98.2% for haplotype1 and 98.3% for haplotype2, further validated by strong synteny with the zebrafish (Danio rerio), confirming the assembly's integrity and continuity. Through integration of full-length transcriptome data, RNA sequencing, and homology-based annotation, we identified 26,057 protein-coding genes with 2,087 pseudogenes in haplotype 2, and 25,622 protein-coding genes with 2,101 pseudogenes in haplotype 1. This high-resolution genome assembly is a crucial resource for advancing research in the Cyprinidae, particularly for understanding adaptive evolution in extreme environments.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.