BMC genomic dataPub Date : 2025-05-28DOI: 10.1186/s12863-025-01330-5
Hayley Goss, Paige Miller, Susan F Zaleski, Robert J Miller, Donna M Schroeder, Henry M Page
{"title":"Draft genome assembly for the purple-hinged rock scallop (Crassadoma gigantea).","authors":"Hayley Goss, Paige Miller, Susan F Zaleski, Robert J Miller, Donna M Schroeder, Henry M Page","doi":"10.1186/s12863-025-01330-5","DOIUrl":"10.1186/s12863-025-01330-5","url":null,"abstract":"","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"39"},"PeriodicalIF":1.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121003/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144176070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2025-05-27DOI: 10.1186/s12863-025-01323-4
Oluyinka Opoola, Felicien Shumbusho, Innocent Rwamuhizi, Isidore Houaga, David Harvey, David Hambrook, Kellie Watson, Mizeck G G Chagunda, Raphael Mrode, Appolinaire Djikeng
{"title":"The genetic structure and diversity of smallholder dairy cattle in Rwanda.","authors":"Oluyinka Opoola, Felicien Shumbusho, Innocent Rwamuhizi, Isidore Houaga, David Harvey, David Hambrook, Kellie Watson, Mizeck G G Chagunda, Raphael Mrode, Appolinaire Djikeng","doi":"10.1186/s12863-025-01323-4","DOIUrl":"10.1186/s12863-025-01323-4","url":null,"abstract":"<p><p>Previous genomic characterisation of Rwanda dairy cattle predominantly focused on the One Cow per Poor Family (locally called \"Girinka\") programme. However, smallholder farmers in Rwanda have benefited from other livestock initiatives and development programmes. Capturing and documenting the genetic diversity, is critical in part as a key contribution to genomic resource required to support dairy development in Rwanda. A total of 2,229 crossbred animals located in all dairy-producing regions of Rwanda were sampled. For each animal, a hair sample was collected and genotyped by using the Geneseek Genomic Profiler (GGP, Neogen Geneseek<sup>®</sup>) Bovine 50 K (n = 1,917) and GGP Bovine 100 K arrays (n = 312). The combined dataset was subject to quality control, data curation for use in population genetics and genomic analyses. To assess the genetic structure and diversity of the current population, key analyses for population structure were applied: Principal Component Analysis (PCA), population structure and diversity, admixture analysis, measures of heterozygosity, runs of homozygosity (ROH) and minor allelic frequency (MAF). A dataset of global dairy population of European taurine, African indicus and African taurus (n = 250) was used as reference. Results showed that Rwanda cattle population is highly admixed of diverse pure and crossbred animals with average MAF of 33% (standard error; se = 0.001) with proportion of foreign high yielding (taurine) dairy breeds of Jersey Island (18%); 12% non-Island Jersey and 42% Holstein-Friesian ancestries. Two African Bos taurus and five Bos indicus breeds contributed 28% of their genetics. Genetic distances were highest in Gir and N'dama (0.29); and Nelore and N'dama (0.29). There were 1,331 ROH regions and average heterozygosity were high for Rwanda cattle (0.41 se = 0.001). Asides well-established genes in cattle, we found evidence for a variety of novel and less-known genes under selection to be associated with fertility, milk production, innate immunity and environmental adaptation. This observed diversity offers opportunity to decipher the presence and/or lack of genetic variations to initiate short- and long-term breed improvement programmes for adaptation traits, disease resistance, heat tolerance, productivity and profitability of smallholder dairy systems in Rwanda.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"38"},"PeriodicalIF":1.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144163855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Whole-genome sequencing of global forest pathogen Diplodia sapinea causing pine shoot blight.","authors":"QuanChao Wang, FeiFei Liu, HuaChao Xu, XuDong Zhou","doi":"10.1186/s12863-025-01328-z","DOIUrl":"10.1186/s12863-025-01328-z","url":null,"abstract":"<p><strong>Objective: </strong>The pathogenic fungus Diplodia sapinea is of significant importance due to its primary role inducing tip dieback on various Pinus species which are widely distributed throughout the world. The objective of this study is to further provide comprehensive and specific resources for genome assembly and sequence annotation of this important forest pathogen from China, thereby establishing a robust foundation for future studies on its systematics, population genetics, genomics and global movement.</p><p><strong>Data description: </strong>A high-quality genome of D. sapinea strain ZXD319 was sequenced utilizing the Nanopore PromethION and BGI DNBSEQ-T7 platforms. The assembled genome spans a total length of 36.81 Mb, comprising 14 contigs, with a GC content of 56.80% and an N50 value of 2,972,533 bp. It encompasses 11,200 protein-coding genes and 252 noncoding RNAs. The predicted genes were annotated against multiple public databases, and 1,611 potential virulence genes were identified through the Pathogen Host Interactions (PHI) database. Furthermore, the genome comparative analysis of D. sapinea and related species revealed 11,568 gene clusters and 3,436 single-copy clusters. Phylogenetic analysis indicated a close evolutionary relationship between D. sapinea with D. corticola and D. seriata. The genomic data presented herein serve as a valuable resource for future studies on this globally important pathogen.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"37"},"PeriodicalIF":1.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144153001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2025-05-22DOI: 10.1186/s12863-025-01329-y
Manuel Zúñiga, Cristina Alcántara, Ángela Peirotén, Luis Andrés Ramón-Nuñez, Vicente Monedero, José María Landete
{"title":"The first complete genome of Fructilactobacillus vespulae: strain Mu01, isolated from nectar of Musa paradisiaca L.","authors":"Manuel Zúñiga, Cristina Alcántara, Ángela Peirotén, Luis Andrés Ramón-Nuñez, Vicente Monedero, José María Landete","doi":"10.1186/s12863-025-01329-y","DOIUrl":"10.1186/s12863-025-01329-y","url":null,"abstract":"<p><strong>Objectives: </strong>Lactobacillales, commonly known as lactic acid bacteria (LAB), is an order of Gram-positive, facultatively anaerobic or microaerophilic bacteria characterized by their ability to ferment carbohydrates and produce lactic acid as a major metabolic byproduct. Many species within this group have significant roles in food fermentation, human health, and industrial applications. Here, we report the complete genome sequence of Fructilactobacillus vespulae Mu01, the first sequenced genome of this species. The complete genome sequence of F. vespulae Mu01 is expected to provide valuable insights into the genetics and metabolism of this little-characterized species.</p><p><strong>Data description: </strong>A novel strain of Fructilactobacillus vespulae was isolated from nectar of Musa paradisiaca L. during a survey for LAB associated with wild and cultivated plants in the metropolitan area of Valencia, Spain. A complete genome was obtained by sequencing with Nanopore long read technology. The genome consists of a chromosome of 1506092 bp and a plasmid of 42437 bp, presenting a GC content of 36 % and 31 %, respectively. The genome includes 1541 genes, with 1450 CDSs, 7 pseudogenes, 18 rRNA encoding genes, 63 tRNAs and 3 ncRNAs.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"36"},"PeriodicalIF":1.9,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational prediction of deleterious nonsynonymous SNPs in the CTNS gene: implications for cystinosis.","authors":"Leila Adda Neggaz, Amira Chahinez Dahmani, Ibtissem Derriche, Nawel Adda Neggaz, Abdallah Boudjema","doi":"10.1186/s12863-025-01325-2","DOIUrl":"https://doi.org/10.1186/s12863-025-01325-2","url":null,"abstract":"<p><strong>Background: </strong>Cystinosis is a rare autosomal recessive lysosomal storage disorder caused by mutations in the CTNS gene, which encodes cystinosin, a lysosomal cystine transporter. These mutations disrupt cystine efflux, leading to its accumulation in lysosomes and subsequent cellular damage. While more than 140 mutations have been identified, the functional and structural impacts of many nonsynonymous single nucleotide polymorphisms (nsSNPs) remain poorly understood. Nonsynonymous SNPs are of particular interest because they can directly alter protein structure and function, potentially leading to disease. Clinically, cystinosis most often presents with renal Fanconi syndrome, photophobia and vision loss due to corneal cystine crystals, and progressive neuromuscular complications such as distal myopathy and swallowing difficulties This study aimed to identify deleterious nsSNPs in the CTNS gene and evaluate their effects on cystinosin stability, structure, and function via computational tools and molecular dynamics simulations.</p><p><strong>Results: </strong>From a dataset of 12,028 SNPs, 327 nsSNPs were identified, among which 19 were consistently classified as deleterious across multiple predictive tools, including SIFT, PolyPhen, and molecular dynamics simulations. Stability predictions revealed that most of these mutations destabilize cystinosin, with G308R and G308V located in the sixth transmembrane domain essential for transporter function having the most severe effects. Molecular dynamics simulations revealed that these mutations significantly increase local flexibility, alter hydrogen bonding patterns, and enhance solvent accessibility, resulting in structural perturbations. Notably, D305G and F142S disrupted the transmembrane domains essential for the function of cystinosin, whereas compared with the wild-type protein, G309V resulted in increased stability. Conservation analysis revealed that 16 of the 19 mutations affected highly conserved residues, indicating their crucial roles in the function of cystinosin. Additionally, protein interaction analyses suggested that mutations could impact associations with lysosomal and membrane transport proteins.</p><p><strong>Conclusions: </strong>This study identified 19 deleterious nsSNPs in the CTNS gene that impair cystinosin stability and function. These findings highlight the structural and functional importance of key residues, such as G308, D305, and F142, which play critical roles in maintaining the active conformation and transport capacity of cystinosin. These insights provide a foundation for future experimental validation and the development of targeted therapeutic strategies to mitigate the effects of pathogenic mutations in cystinosis.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"35"},"PeriodicalIF":1.9,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12079974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144082491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Molecular characterization and phylogenetic analyses of the mitogenome of Wan-Xi white goose, a native goose breed in China.","authors":"Lunbin Xia, Shaoshuai Bi, Yafei Zhang, Cunwu Chen, Naidong Chen","doi":"10.1186/s12863-025-01326-1","DOIUrl":"10.1186/s12863-025-01326-1","url":null,"abstract":"<p><strong>Background: </strong>The Wan-Xi white goose (WXG), an indigenous Chinese waterfowl (Anserini: Anserinae), is crucial for goose germplasm conservation. This study aimed to sequence and analyze the complete mitochondrial DNA (mtDNA) of WXG using the BGISEQ-500 platform. The mtDNA's structure and function were investigated to gain insights into its genetic diversity and population structure.</p><p><strong>Results: </strong>The mtDNA was found to be 16,743 bp long and comprised 22 transfer RNA (tRNA) genes, 2 ribosomal RNA genes, a complement of 13 protein-coding genes (PCGs), as well as a single noncoding control region known as the D-loop. Notably, all tRNA genes, except for trnS1-tRNA which lacked the dihydrouridine stem, were predicted to adopt the typical cloverleaf structure. Given the genetic variability across the mtDNA of Anser spp. and the intergenic gaps identified by codon analysis, the codon usage patterns were comprehensively examined via comparative analysis of the mtDNAs of WXG and 24 other Anser spp. The relative synonymous codon usage (RSCU) values of the 13 mitochondrial PCGs of WXG were consistent with those of the mitochondrial PCGs of the 24 other Anser spp. Analysis of the neutrality (GC3-GC12), the effective number of codons (ENCs)-GC3, and parity rule 2-bias plots further revealed that natural selection emerged as the primary factor influencing codon bias in Anser sp. High nucleotide diversity (Pi > 0.02) was observed in several regions, including the D-loop, ATP6, 12S rRNA, ND1, 16S rRNA_ND1, COX2, and ND5. Furthermore, the results of nonsynonymous (Ka)/synonymous (Ks) analysis of the 13 mitochondrial PCGs of the 25 species under Anser revealed that the genes were subject to strong purifying selection. The findings of phylogenetic analysis further revealed that WXG and 10 other members of Anser cygnoides clustered into a single branch to form a monophyletic group.</p><p><strong>Conclusion: </strong>This research provides valuable insights into the mtDNA of WXG, highlighting its genetic diversity and population structure. The identified mutation hotspots and purifying selection on mitochondrial PCGs suggest potential areas for future research on Anser cygnoides. The findings contribute to our understanding of this rare species and its conservation efforts.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"34"},"PeriodicalIF":1.9,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2025-05-01DOI: 10.1186/s12863-025-01306-5
Liuheyi Ma, Xiaoyu Zuo, Bingtai Lu, Yuxia Zhang
{"title":"Correlation of METTL4 genetic variants and severe pneumonia pediatric patients in Southern China.","authors":"Liuheyi Ma, Xiaoyu Zuo, Bingtai Lu, Yuxia Zhang","doi":"10.1186/s12863-025-01306-5","DOIUrl":"https://doi.org/10.1186/s12863-025-01306-5","url":null,"abstract":"<p><strong>Background: </strong>Pneumonia is a major cause of mortality and health burden in children under five, yet its genetic etiology remains poorly understood. Methyltransferase 4, N6-adenosine (METTL4), is a methyltransferase enzyme responsible for RNA and DNA methylation and is known to be activated under hypoxic conditions. However, its potential link to susceptibility to pneumonia has not been evaluated. This study aimed to explore candidate regulatory single nucleotide polymorphisms (SNPs) within the METTL4 gene and their association with the development of severe pneumonia.</p><p><strong>Results: </strong>In this study, we recruited a cohort of 1034 children with severe pneumonia and 8426 healthy controls. We investigated the associations of candidate regulatory single nucleotide polymorphisms (SNPs) within METTL4 polymorphisms with severe pneumonia. Our results indicated that the C allele of rs9989554 (P = 0.00023, OR = 1.21, 95% CI: 1.09-1.34) and the G allele of rs16943442 (P = 0.0026, OR = 1.22, 95% CI: 1.07-1.38) were significantly associated with an increased risk of severe pneumonia. The regulatory potential of these two SNPs in the lung was investigated using tools such as expression quantitative trait loci (eQTLs), RegulomeDB, and FORGEdb.</p><p><strong>Conclusions: </strong>This study represents the first investigation elucidating the role of genetic variations in the METTL4 gene and their influence on susceptibility to severe pneumonia in pediatric populations. METTL4 is identified as a novel predisposing gene for severe pneumonia and a potential therapeutic target. Further research is warranted to validate this correlation and to comprehensively elucidate the biological role of the METTL4 gene in severe pneumonia.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"33"},"PeriodicalIF":1.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12044828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2025-04-23DOI: 10.1186/s12863-025-01322-5
Xiangyu Zhang, Kai Zhang, Dengping Huang, Shangjun Yang, Min Zhang, Qin Yin
{"title":"Comprehensive transcriptome of muscle development in Sichuan white rabbit.","authors":"Xiangyu Zhang, Kai Zhang, Dengping Huang, Shangjun Yang, Min Zhang, Qin Yin","doi":"10.1186/s12863-025-01322-5","DOIUrl":"https://doi.org/10.1186/s12863-025-01322-5","url":null,"abstract":"<p><strong>Background: </strong>The Sichuan white rabbit is a unique domestic breed and is famous for its high meat production. Muscle development is a complicated biological process, but the underlying regulatory mechanisms have not been elucidated. Here, we generated comprehensive transcriptome datasets (i.e., mRNAs, miRNAs and lncRNAs) in three developmental stages of Sichuan white rabbits, and aim to systematically explore the regulatory network in myogenesis.</p><p><strong>Results: </strong>We generated extensive transcriptome datasets (mRNAs, miRNAs and lncRNAs) revealing the myogenic regulatory network at different time points. Our differential expression analysis identified 2,995 DE genes, 1,211 DE-lncRNAs, and 305 DE-miRNAs with distinct expression patterns across developmental stages. In addition, functional enrichment analysis of DE mRNAs and miRNAs indicates their involvement in muscle growth, development, and regeneration, highlighting biological processes and muscle-specific functions. Interaction analysis between DE-lncRNAs and mRNAs uncovered a complex regulatory network, especially between 21 and 27 days of development. These findings contribute to better understanding of the transcriptomic changes during muscle development and have implications for breeding improvement in Sichuan white rabbits.</p><p><strong>Conclusions: </strong>Our study provides a comprehensive overview of the transcriptomic changes during muscle development in Sichuan white rabbits. The identification and functional annotation of DE genes, miRNAs, and lncRNAs provide valuable insights into the molecular mechanisms underlying this process. These findings pave the way for targeted investigations into the role of non-coding RNAs in muscle biology.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"32"},"PeriodicalIF":1.9,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12016129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144042139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2025-04-18DOI: 10.1186/s12863-025-01319-0
Eva Kjæld Hansen, Jens Ivan Í Gerðinum, Dag Inge Våge, Svein-Ole Mikalsen
{"title":"Faroese sheep expand overall global ovine genetic diversity.","authors":"Eva Kjæld Hansen, Jens Ivan Í Gerðinum, Dag Inge Våge, Svein-Ole Mikalsen","doi":"10.1186/s12863-025-01319-0","DOIUrl":"https://doi.org/10.1186/s12863-025-01319-0","url":null,"abstract":"<p><strong>Background: </strong>Faroese sheep have an unclear history. While it is assumed that the Vikings brought sheep to the Faroes, traces of pre-Viking age sheep are also found. Historical sources cite disasters in a period around year 1600 that essentially eradicated the sheep population, and subsequent imports from Iceland to the northern part of Faroes, and from Shetland and Orkneys to the southern part of Faroes. We have here investigated the genetic relationship of northern Faroe sheep with other breeds.</p><p><strong>Results: </strong>A total of 359 sheep from four flocks from three Faroese islands (Streymoy, Eysturoy, Kalsoy) were genotyped using the GeneSeek Genomic Profiler Ovine 50K chip. The samples were clearly stratified into three groups corresponding to island of origin. This is likely due to the minimal transport of animals between the islands during extended periods of time. The Faroese samples were compared with the data from the Sheep HapMap database, representing breeds from different parts of the world, and, additionally, Norwegian White Sheep. The Northern European short-tailed breeds clearly stood out from the remaining global breeds, and Faroese sheep gained a peripheral position among the other North Atlantic short-tail breeds, with Icelandic sheep and Norwegian spael as their closest neighbors. The peripheral position suggests that the link to the surrounding breeds might be more distant than expected.</p><p><strong>Conclusions: </strong>Despite known imports of sheep from neighboring countries after the year 1600, this is poorly reflected in the genotyping data. One possible explanation could be that the present-day Faroese sheep have an unbroken genetic link to the pre-year 1300 Faroese sheep (which possibly were a mix of old-Norse and old-British/Irish animals), regardless of the presumed post-year 1600 influence from other breeds in the North Atlantic region.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"31"},"PeriodicalIF":1.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144058382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Construction of a comprehensive library of repeated sequences for the annotation of Citrus genomes.","authors":"Delphine Giraud, Nathalie Choisne, Marilyne Summo, Stéphanie Sidibe-Bocs, Héléna Vassilieff, Gilles Costantino, Gaetan Droc, Pierre-Yves Teycheney, Florian Maumus, Patrick Ollitrault, François Luro","doi":"10.1186/s12863-025-01321-6","DOIUrl":"https://doi.org/10.1186/s12863-025-01321-6","url":null,"abstract":"<p><strong>Background: </strong>The comprehensive annotation of repeated sequences in genomes is an essential prerequisite for studying the dynamics of these sequences over time and their involvement in gene regulation. Currently, the diversity of repeated sequences in Citrus genomes is only partially characterized because the annotations have been performed using heterogeneous bioinformatics tools, each with its specificity and dedicated only to the annotation of transposable elements.</p><p><strong>Results: </strong>We combined complementary repeat-finding programs including REPET, CAULIFINDER, and TAREAN, to enable the identification of all types of repetitive sequences found in plant genomes, including transposable elements, endogenous caulimovirids, and satellite DNAs. A fine-grained annotation method was first developed to create a consensus sequence library of repeated sequences identified in the genome assemblies of C. medica, C. micrantha, C. reticulata, and C. maxima, the four ancestral parental species involved in the formation of economically valuable cultivated Citrus varieties. A second, faster annotation method was developed to enrich the dataset by adding new repeated sequences retrieved from genome assemblies of other Citrus species and closely related species belonging to the Aurantioideae subfamily. The final reference library contains 3,091 consensus sequences, of which 94.5% are transposable elements. The diversity of endogenous caulimovirids was characterized for the first time within the genus Citrus, contributing 160 consensus sequences to the final reference library. Finally, 10 satellite DNAs were also identified.</p><p><strong>Conclusion: </strong>Combining multiple repeat detection methods enables the comprehensive annotation of all repeated sequences in Citrus genomes. Using the final reference library reported in this work will improve our understanding of the dynamics of repeated sequences during Citrus speciation, particularly following the genome duplication and hybridization events that led to modern cultivars. The exploration of repeat position insertions along chromosomes using the developed web interface, RepeatLoc Citrus, will also make it possible to further investigate the role of transposable elements and endogenous caulimovirids in genome structure and gene regulation in Citrus species.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"30"},"PeriodicalIF":1.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12007355/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}