{"title":"Near-complete assembly and comprehensive annotation of the wheat Chinese Spring genome.","authors":"Zijian Wang, Lingfeng Miao, Kaiwen Tan, Weilong Guo, Beibei Xin, Rudi Appels, Jizeng Jia, Jinsheng Lai, Fei Lu, Zhongfu Ni, Xiangdong Fu, Qixin Sun, Jian Chen","doi":"10.1016/j.molp.2025.02.002","DOIUrl":null,"url":null,"abstract":"<p><p>A complete reference genome assembly is crucial for biological research and genetic improvement. Owing to its large size and highly repetitive nature, there are numerous gaps in the globally used wheat Chinese Spring (CS) genome assembly. In this study, we generated a 14.46 Gb near-complete assembly of the CS genome, with a contig N50 of over 266 Mb and an overall base accuracy of 99.9963%. Among the 290 gaps that remained (26, 257, and 7 gaps from the A, B, and D subgenomes, respectively), 278 were extremely high-copy tandem repeats, whereas the remaining 12 were transposable-element-associated gaps. Four chromosome assemblies were completely gap-free, including chr1D, chr3D, chr4D, and chr5D. Extensive annotation of the near-complete genome revealed 151 405 high-confidence genes, of which 59 180 were newly annotated, including 7602 newly assembled genes. Except for the centromere of chr1B, which has a gap associated with superlong GAA repeat arrays, the centromeric sequences of all of the remaining 20 chromosomes were completely assembled. Our near-complete assembly revealed that the extent of tandem repeats, such as simple-sequence repeats, was highly uneven among different subgenomes. Similarly, the repeat compositions of the centromeres also varied among the three subgenomes. With the genome sequences of all six types of seed storage proteins (SSPs) fully assembled, the expression of ω-gliadin was found to be contributed entirely by the B subgenome, whereas the expression of the other five types of SSPs was most abundant from the D subgenome. The near-complete CS genome will serve as a valuable resource for genomic and functional genomic research and breeding of wheat as well as its related species.</p>","PeriodicalId":19012,"journal":{"name":"Molecular Plant","volume":" ","pages":""},"PeriodicalIF":17.1000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Plant","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.molp.2025.02.002","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
A complete reference genome assembly is crucial for biological research and genetic improvement. Owing to its large size and highly repetitive nature, there are numerous gaps in the globally used wheat Chinese Spring (CS) genome assembly. In this study, we generated a 14.46 Gb near-complete assembly of the CS genome, with a contig N50 of over 266 Mb and an overall base accuracy of 99.9963%. Among the 290 gaps that remained (26, 257, and 7 gaps from the A, B, and D subgenomes, respectively), 278 were extremely high-copy tandem repeats, whereas the remaining 12 were transposable-element-associated gaps. Four chromosome assemblies were completely gap-free, including chr1D, chr3D, chr4D, and chr5D. Extensive annotation of the near-complete genome revealed 151 405 high-confidence genes, of which 59 180 were newly annotated, including 7602 newly assembled genes. Except for the centromere of chr1B, which has a gap associated with superlong GAA repeat arrays, the centromeric sequences of all of the remaining 20 chromosomes were completely assembled. Our near-complete assembly revealed that the extent of tandem repeats, such as simple-sequence repeats, was highly uneven among different subgenomes. Similarly, the repeat compositions of the centromeres also varied among the three subgenomes. With the genome sequences of all six types of seed storage proteins (SSPs) fully assembled, the expression of ω-gliadin was found to be contributed entirely by the B subgenome, whereas the expression of the other five types of SSPs was most abundant from the D subgenome. The near-complete CS genome will serve as a valuable resource for genomic and functional genomic research and breeding of wheat as well as its related species.
期刊介绍:
Molecular Plant is dedicated to serving the plant science community by publishing novel and exciting findings with high significance in plant biology. The journal focuses broadly on cellular biology, physiology, biochemistry, molecular biology, genetics, development, plant-microbe interaction, genomics, bioinformatics, and molecular evolution.
Molecular Plant publishes original research articles, reviews, Correspondence, and Spotlights on the most important developments in plant biology.