{"title":"柑橘基因组重复序列综合文库的构建。","authors":"Delphine Giraud, Nathalie Choisne, Marilyne Summo, Stéphanie Sidibe-Bocs, Héléna Vassilieff, Gilles Costantino, Gaetan Droc, Pierre-Yves Teycheney, Florian Maumus, Patrick Ollitrault, François Luro","doi":"10.1186/s12863-025-01321-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The comprehensive annotation of repeated sequences in genomes is an essential prerequisite for studying the dynamics of these sequences over time and their involvement in gene regulation. Currently, the diversity of repeated sequences in Citrus genomes is only partially characterized because the annotations have been performed using heterogeneous bioinformatics tools, each with its specificity and dedicated only to the annotation of transposable elements.</p><p><strong>Results: </strong>We combined complementary repeat-finding programs including REPET, CAULIFINDER, and TAREAN, to enable the identification of all types of repetitive sequences found in plant genomes, including transposable elements, endogenous caulimovirids, and satellite DNAs. A fine-grained annotation method was first developed to create a consensus sequence library of repeated sequences identified in the genome assemblies of C. medica, C. micrantha, C. reticulata, and C. maxima, the four ancestral parental species involved in the formation of economically valuable cultivated Citrus varieties. A second, faster annotation method was developed to enrich the dataset by adding new repeated sequences retrieved from genome assemblies of other Citrus species and closely related species belonging to the Aurantioideae subfamily. The final reference library contains 3,091 consensus sequences, of which 94.5% are transposable elements. The diversity of endogenous caulimovirids was characterized for the first time within the genus Citrus, contributing 160 consensus sequences to the final reference library. Finally, 10 satellite DNAs were also identified.</p><p><strong>Conclusion: </strong>Combining multiple repeat detection methods enables the comprehensive annotation of all repeated sequences in Citrus genomes. Using the final reference library reported in this work will improve our understanding of the dynamics of repeated sequences during Citrus speciation, particularly following the genome duplication and hybridization events that led to modern cultivars. The exploration of repeat position insertions along chromosomes using the developed web interface, RepeatLoc Citrus, will also make it possible to further investigate the role of transposable elements and endogenous caulimovirids in genome structure and gene regulation in Citrus species.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"30"},"PeriodicalIF":1.9000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12007355/pdf/","citationCount":"0","resultStr":"{\"title\":\"Construction of a comprehensive library of repeated sequences for the annotation of Citrus genomes.\",\"authors\":\"Delphine Giraud, Nathalie Choisne, Marilyne Summo, Stéphanie Sidibe-Bocs, Héléna Vassilieff, Gilles Costantino, Gaetan Droc, Pierre-Yves Teycheney, Florian Maumus, Patrick Ollitrault, François Luro\",\"doi\":\"10.1186/s12863-025-01321-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The comprehensive annotation of repeated sequences in genomes is an essential prerequisite for studying the dynamics of these sequences over time and their involvement in gene regulation. Currently, the diversity of repeated sequences in Citrus genomes is only partially characterized because the annotations have been performed using heterogeneous bioinformatics tools, each with its specificity and dedicated only to the annotation of transposable elements.</p><p><strong>Results: </strong>We combined complementary repeat-finding programs including REPET, CAULIFINDER, and TAREAN, to enable the identification of all types of repetitive sequences found in plant genomes, including transposable elements, endogenous caulimovirids, and satellite DNAs. A fine-grained annotation method was first developed to create a consensus sequence library of repeated sequences identified in the genome assemblies of C. medica, C. micrantha, C. reticulata, and C. maxima, the four ancestral parental species involved in the formation of economically valuable cultivated Citrus varieties. A second, faster annotation method was developed to enrich the dataset by adding new repeated sequences retrieved from genome assemblies of other Citrus species and closely related species belonging to the Aurantioideae subfamily. The final reference library contains 3,091 consensus sequences, of which 94.5% are transposable elements. The diversity of endogenous caulimovirids was characterized for the first time within the genus Citrus, contributing 160 consensus sequences to the final reference library. Finally, 10 satellite DNAs were also identified.</p><p><strong>Conclusion: </strong>Combining multiple repeat detection methods enables the comprehensive annotation of all repeated sequences in Citrus genomes. Using the final reference library reported in this work will improve our understanding of the dynamics of repeated sequences during Citrus speciation, particularly following the genome duplication and hybridization events that led to modern cultivars. The exploration of repeat position insertions along chromosomes using the developed web interface, RepeatLoc Citrus, will also make it possible to further investigate the role of transposable elements and endogenous caulimovirids in genome structure and gene regulation in Citrus species.</p>\",\"PeriodicalId\":72427,\"journal\":{\"name\":\"BMC genomic data\",\"volume\":\"26 1\",\"pages\":\"30\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12007355/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC genomic data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s12863-025-01321-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC genomic data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12863-025-01321-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Construction of a comprehensive library of repeated sequences for the annotation of Citrus genomes.
Background: The comprehensive annotation of repeated sequences in genomes is an essential prerequisite for studying the dynamics of these sequences over time and their involvement in gene regulation. Currently, the diversity of repeated sequences in Citrus genomes is only partially characterized because the annotations have been performed using heterogeneous bioinformatics tools, each with its specificity and dedicated only to the annotation of transposable elements.
Results: We combined complementary repeat-finding programs including REPET, CAULIFINDER, and TAREAN, to enable the identification of all types of repetitive sequences found in plant genomes, including transposable elements, endogenous caulimovirids, and satellite DNAs. A fine-grained annotation method was first developed to create a consensus sequence library of repeated sequences identified in the genome assemblies of C. medica, C. micrantha, C. reticulata, and C. maxima, the four ancestral parental species involved in the formation of economically valuable cultivated Citrus varieties. A second, faster annotation method was developed to enrich the dataset by adding new repeated sequences retrieved from genome assemblies of other Citrus species and closely related species belonging to the Aurantioideae subfamily. The final reference library contains 3,091 consensus sequences, of which 94.5% are transposable elements. The diversity of endogenous caulimovirids was characterized for the first time within the genus Citrus, contributing 160 consensus sequences to the final reference library. Finally, 10 satellite DNAs were also identified.
Conclusion: Combining multiple repeat detection methods enables the comprehensive annotation of all repeated sequences in Citrus genomes. Using the final reference library reported in this work will improve our understanding of the dynamics of repeated sequences during Citrus speciation, particularly following the genome duplication and hybridization events that led to modern cultivars. The exploration of repeat position insertions along chromosomes using the developed web interface, RepeatLoc Citrus, will also make it possible to further investigate the role of transposable elements and endogenous caulimovirids in genome structure and gene regulation in Citrus species.