{"title":"Genome sequence and assembly of the amylolytic Bacillus licheniformis T5 strain isolated from Kazakhstan soil.","authors":"Arman Mussakhmetov, Assel Kiribayeva, Asset Daniyarov, Aitbay Bulashev, Ulykbek Kairov, Bekbolat Khassenov","doi":"10.1186/s12863-023-01177-8","DOIUrl":"10.1186/s12863-023-01177-8","url":null,"abstract":"<p><strong>Objectives: </strong>The data presented in this study were collected with the aim of obtaining the complete genomes of specific strains of Bacillus bacteria, namely, Bacillus licheniformis T5. This strain was chosen based on its enzymatic activities, particularly amylolytic activity. In this study, nanopore sequencing technology was employed to obtain the genome sequences of this strain. It is important to note that these data represent a focused objective within a larger research context, which involves exploring the biochemical features of promising Bacilli strains and investigating the relationship between enzymatic activity, phenotypic features, and the microorganism's genome.</p><p><strong>Data description: </strong>In this study, the whole-genome sequence was obtained from one Bacillus strain, Bacillus licheniformis T5, isolated from soil samples in Kazakhstan. Sample preparation and genomic DNA library construction were performed according to the Ligation sequencing gDNA kit (SQK-LSK109) protocol and NEBNext module. The prepared library was sequenced on a MinION instrument (Oxford Nanopore Technologies nanopore sequencer with a maximum throughput of up to 30 billion nucleotides per run and no limit on read length), using a flow cell for nanopore sequencing FLO-MIN106D. The genome de novo assembly was performed using the long sequencing reads generated by MinION Oxford Nanopore platform. Finally, one circular contig was obtained harboring a length of 4,247,430 bp with 46.16% G + C content and the mean contig 428X coverage. B. licheniformis T5 genome assembly annotation revealed 5391 protein-coding sequences, 81 tRNAs, 51 repeat regions, 24 rRNAs, 3 virulence factors and 53 antibiotic resistance genes. This sequence encompasses the complete genetic information of the strain, including genes, regulatory elements, and noncoding regions. The data reveal important insights into the genetic characteristics, phenotypic traits, and enzymatic activity of this Bacillus strain. The findings of this study have particular value to researchers interested in microbial biology, biotechnology, and antimicrobial studies. The genomic sequence offers a foundation for understanding the genetic basis of traits such as endospore formation, alkaline tolerance, temperature range for growth, nutrient utilization, and enzymatic activities. These insights can contribute to the development of novel biotechnological applications, such as the production of enzymes for industrial purposes. Overall, this study provides valuable insights into the genetic characteristics, phenotypic traits, and enzymatic activities of the Bacillus licheniformis T5 strain. The acquired genomic sequences contribute to a better understanding of this strain and have implications for various research fields, such as microbiology, biotechnology, and antimicrobial studies.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"25 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2024-01-02DOI: 10.1186/s12863-023-01185-8
Keren Levinstein Hallak, Saharon Rosset
{"title":"Dating ancient splits in phylogenetic trees, with application to the human-Neanderthal split.","authors":"Keren Levinstein Hallak, Saharon Rosset","doi":"10.1186/s12863-023-01185-8","DOIUrl":"10.1186/s12863-023-01185-8","url":null,"abstract":"<p><strong>Background: </strong>We tackle the problem of estimating species TMRCAs (Time to Most Recent Common Ancestor), given a genome sequence from each species and a large known phylogenetic tree with a known structure (typically from one of the species). The number of transitions at each site from the first sequence to the other is assumed to be Poisson distributed, and only the parity of the number of transitions is observed. The detailed phylogenetic tree contains information about the transition rates in each site. We use this formulation to develop and analyze multiple estimators of the species' TMRCA. To test our methods, we use mtDNA substitution statistics from the well-established Phylotree as a baseline for data simulation such that the substitution rate per site mimics the real-world observed rates.</p><p><strong>Results: </strong>We evaluate our methods using simulated data and compare them to the Bayesian optimizing software BEAST2, showing that our proposed estimators are accurate for a wide range of TMRCAs and significantly outperform BEAST2. We then apply the proposed estimators on Neanderthal, Denisovan, and Chimpanzee mtDNA genomes to better estimate their TMRCA with modern humans and find that their TMRCA is substantially later, compared to values cited recently in the literature.</p><p><strong>Conclusions: </strong>Our methods utilize the transition statistics from the entire known human mtDNA phylogenetic tree (Phylotree), eliminating the requirement to reconstruct a tree encompassing the specific sequences of interest. Moreover, they demonstrate notable improvement in both running speed and accuracy compared to BEAST2, particularly for earlier TMRCAs like the human-Chimpanzee split. Our results date the human - Neanderthal TMRCA to be [Formula: see text] years ago, considerably later than values cited in other recent studies.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"25 1","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2024-01-02DOI: 10.1186/s12863-023-01187-6
LinQin Lu, GuoQing Li, FeiFei Liu
{"title":"High-quality genome resource of Lasiodiplodia pseudotheobromae associated with die-back on Eucalyptus trees.","authors":"LinQin Lu, GuoQing Li, FeiFei Liu","doi":"10.1186/s12863-023-01187-6","DOIUrl":"10.1186/s12863-023-01187-6","url":null,"abstract":"<p><strong>Objectives: </strong>Lasiodiplodia pseudotheobromae is an important fungal pathogen associated with die-back, canker and shoot blight in many plant hosts with a wide geographic distribution. The aim of our study was to provide high-quality genome assemblies and sequence annotation resources of L. pseudotheobromae, to facilitate future studies on the systematics, population genetics and genomics of the fungal pathogen L. pseudotheobromae.</p><p><strong>Data description: </strong>High-quality genomes of five L. pseudotheobromae isolates were sequenced based on Oxford Nanopore technology (ONT) and Illumina HiSeq sequencing platform. The total size of each assembly ranged from 43 Mb to 43.86 Mb and over 11,000 protein-coding genes were predicted from each genome. The proteins of predicted genes were annotated using multiple public databases, among the annotated protein-coding genes, more than 4,300 genes were predicted as potential virulence genes by the Pathogen Host Interactions (PHI) database. Moreover, the genome comparative analysis among L. pseudotheobromae and other closely related species revealed that 7,408 gene clusters were shared among them and 152 gene clusters unique to L. pseudotheobromae. This genome and associated datasets provided here will serve as a useful resource for further analyses of this fungal pathogen species.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"25 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759541/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2024-01-02DOI: 10.1186/s12863-023-01186-7
Andre S Chanderbali, Christopher Dervinis, Ioana G Anghel, Douglas E Soltis, Pamela S Soltis, Felipe Zapata
{"title":"Draft genome assemblies for two species of Escallonia (Escalloniales).","authors":"Andre S Chanderbali, Christopher Dervinis, Ioana G Anghel, Douglas E Soltis, Pamela S Soltis, Felipe Zapata","doi":"10.1186/s12863-023-01186-7","DOIUrl":"10.1186/s12863-023-01186-7","url":null,"abstract":"<p><strong>Objectives: </strong>Escallonia (Escalloniaceae) belongs to the Escalloniales, a diverse clade of flowering plants with unclear placement in the tree of life. Escallonia species show impressive morphological and ecological diversity and are widely distributed across three hotspots of biodiversity in the Neotropics. To shed light on the genomic substrate of this radiation and the phylogenetic placement of Escalloniales as well as to generate useful data for comparative evolutionary genomics across flowering plants, we produced and annotated draft genomes for two species of Escallonia.</p><p><strong>Data description: </strong>Genomic DNA from E. rubra and E. herrerae was sequenced with Oxford Nanopore sequencing chemistry, generating 3.4 and 12 million sequence reads with an average read length of 9.4 and 9.1 Kb (approximately 31 and 111 Gb of sequence data), respectively. In addition, we generated Illumina 100-bp paired-end short read data for E. rubra (approximately 75 Gb of sequence data). The Escallonia rubra genome was 566 Mb, with 3,233 contigs and an N50 of 285 Kb. The assembled genome for E. herrerae was 994 Mp, with 5,760 contigs and an N50 of 317 Kb. The genome sequences were annotated with 31,038 (E. rubra) and 47,905 (E. herrerea) protein-coding gene models supported by transcriptome/protein evidence and/or Pfam domain content. BUSCO assessments indicated completeness levels of approximately 98% for the genome assemblies and 88% for the genome annotations.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"25 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2023-12-18DOI: 10.1186/s12863-023-01179-6
Dennis N Lozada, Karansher Singh Sandhu, Madhav Bhatta
{"title":"Ridge regression and deep learning models for genome-wide selection of complex traits in New Mexican Chile peppers.","authors":"Dennis N Lozada, Karansher Singh Sandhu, Madhav Bhatta","doi":"10.1186/s12863-023-01179-6","DOIUrl":"10.1186/s12863-023-01179-6","url":null,"abstract":"<p><strong>Background: </strong>Genomewide prediction estimates the genomic breeding values of selection candidates which can be utilized for population improvement and cultivar development. Ridge regression and deep learning-based selection models were implemented for yield and agronomic traits of 204 chile pepper genotypes evaluated in multi-environment trials in New Mexico, USA.</p><p><strong>Results: </strong>Accuracy of prediction differed across different models under ten-fold cross-validations, where high prediction accuracy was observed for highly heritable traits such as plant height and plant width. No model was superior across traits using 14,922 SNP markers for genomewide selection. Bayesian ridge regression had the highest average accuracy for first pod date (0.77) and total yield per plant (0.33). Multilayer perceptron (MLP) was the most superior for flowering time (0.76) and plant height (0.73), whereas the genomic BLUP model had the highest accuracy for plant width (0.62). Using a subset of 7,690 SNP loci resulting from grouping markers based on linkage disequilibrium coefficients resulted in improved accuracy for first pod date, ten pod weight, and total yield per plant, even under a relatively small training population size for MLP and random forest models. Genomic and ridge regression BLUP models were sufficient for optimal prediction accuracies for small training population size. Combining phenotypic selection and genomewide selection resulted in improved selection response for yield-related traits, indicating that integrated approaches can result in improved gains achieved through selection.</p><p><strong>Conclusions: </strong>Accuracy values for ridge regression and deep learning prediction models demonstrate the potential of implementing genomewide selection for genetic improvement in chile pepper breeding programs. Ultimately, a large training data is relevant for improved genomic selection accuracy for the deep learning models.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"24 1","pages":"80"},"PeriodicalIF":0.0,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10726521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138811331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Association between MC1R gene and coat color segregation in Shanxia long black pig and Lulai black pig.","authors":"Hao Zheng, San-Ya Xiong, Shi-Jun Xiao, Ze-Kai Zhang, Jin-Min Tu, Deng-Shuai Cui, Nai-Biao Yu, Zhi-Yong Huang, Long-Yun Li, Yuan-Mei Guo","doi":"10.1186/s12863-023-01161-2","DOIUrl":"10.1186/s12863-023-01161-2","url":null,"abstract":"<p><strong>Background: </strong>Coat color, as a distinct phenotypic characteristic of pigs, is often subject to preference and selection, such as in the breeding process of new breed. Shanxia long black pig was derived from an intercross between Berkshire boars and Licha black pig sows, and it was bred as a paternal strain with high-quality meat and black coat color. Although the coat color was black in the F<sub>1</sub> generation of the intercross, it segregated in the subsequent generations. This study aims to decode the genetic basis of coat color segregation and develop a method to distinct black pigs from the spotted in Shanxia long black pig.</p><p><strong>Results: </strong>Only a QTL was mapped at the proximal end of chromosome 6, and MC1R gene was picked out as functional candidate gene. A total of 11 polymorphic loci were identified in MC1R gene, and only the c.67_68insCC variant was co-segregating with coat color. This locus isn't recognized by any restriction endonuclease, so it can't be genotyped by PCR-RFLP. The c.370G > A polymorphic locus was also significantly associated with coat color, and has been in tightly linkage disequilibrium with the c.67_68insCC. Furthermore, it is recognized by BspHI. Therefore, a PCR-RFLP method was set up to genotype this locus. Besides the 175 sequenced individuals, another more 1,391 pigs were genotyped with PCR-RFLP, and all of pigs with GG (one band) were black.</p><p><strong>Conclusion: </strong>MC1R gene (c.67_68insCC) is the causative gene (mutation) for the coat color segregation, and the PCR-RFLP of c.370G > A could be used in the breeding program of Shanxia long black pig.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"24 1","pages":"74"},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138464740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome assembly of Erythrophleum Fordii, a special \"ironwood\" tree in China.","authors":"Chang-Yu Wen, Ju-Yu Lian, Wei-Xiong Peng, Zheng-Feng Wang, Zhi-Gang Yang, Hong-Lin Cao","doi":"10.1186/s12863-023-01176-9","DOIUrl":"10.1186/s12863-023-01176-9","url":null,"abstract":"<p><strong>Objectives: </strong>Erythrophleum is a genus in the Fabaceae family. The genus contains only about 10 species, and it is best known for its hardwood and medical properties worldwide. Erythrophleum fordii Oliv. is the only species of this genus distributed in China. It has superior wood and can be used in folk medicine, which leads to its overexploitation in the wild. For its effective conservation and elucidation of the distinctive genetic traits of wood formation and medical components, we present its first genome assembly.</p><p><strong>Data description: </strong>This work generated ~ 160.8 Gb raw Nanopore whole genome sequencing (WGS) long reads, ~ 126.0 Gb raw MGI WGS short reads and ~ 29.0 Gb raw RNA-seq reads using E. fordii leaf tissues. The de novo assembly contained 864,825,911 bp in the E. fordii genome, with 59 contigs and a contig N50 of 30,830,834 bp. Benchmarking Universal Single-Copy Orthologs (BUSCO) revealed 98.7% completeness of the assembly. The assembly contained 471,006,885 bp (54.4%) repetitive sequences and 28,761 genes that coded for 33,803 proteins. The protein sequences were functionally annotated against multiple databases, facilitating comparative genomic analysis.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"24 1","pages":"73"},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685560/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138453257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2023-11-28DOI: 10.1186/s12863-023-01175-w
Yanan Liu, Bo Han, Weijie Zheng, Peng Peng, Chendong Yang, Guie Jiang, Yabin Ma, Jianming Li, Junqing Ni, Dongxiao Sun
{"title":"Identification of genetic associations and functional SNPs of bovine KLF6 gene on milk production traits in Chinese holstein.","authors":"Yanan Liu, Bo Han, Weijie Zheng, Peng Peng, Chendong Yang, Guie Jiang, Yabin Ma, Jianming Li, Junqing Ni, Dongxiao Sun","doi":"10.1186/s12863-023-01175-w","DOIUrl":"10.1186/s12863-023-01175-w","url":null,"abstract":"<p><strong>Background: </strong>Our previous research identified the Kruppel like factor 6 (KLF6) gene as a prospective candidate for milk production traits in dairy cattle. The expression of KLF6 in the livers of Holstein cows during the peak of lactation was significantly higher than that during the dry and early lactation periods. Notably, it plays an essential role in activating peroxisome proliferator-activated receptor α (PPARα) signaling pathways. The primary aim of this study was to further substantiate whether the KLF6 gene has significant genetic effects on milk traits in dairy cattle.</p><p><strong>Results: </strong>Through direct sequencing of PCR products with pooled DNA, we totally identified 12 single nucleotide polymorphisms (SNPs) within the KLF6 gene. The set of SNPs encompasses 7 located in 5' flanking region, 2 located in exon 2 and 3 located in 3' untranslated region (UTR). Of these, the g.44601035G > A is a missense mutation that resulting in the replacement of arginine (CGG) with glutamine (CAG), consequently leading to alterations in the secondary structure of the KLF6 protein, as predicted by SOPMA. The remaining 7 regulatory SNPs significantly impacted the transcriptional activity of KLF6 following mutation (P < 0.005), manifesting as changes in transcription factor binding sites. Additionally, 4 SNPs located in both the UTR and exons were predicted to influence the secondary structure of KLF6 mRNA using the RNAfold web server. Furthermore, we performed the genotype-phenotype association analysis using SAS 9.2 which found all the 12 SNPs were significantly correlated to milk yield, fat yield, fat percentage, protein yield and protein percentage within both the first and second lactations (P < 0.0001 ~ 0.0441). Also, with Haploview 4.2 software, we found the 12 SNPs linked closely and formed a haplotype block, which was strongly associated with five milk traits (P < 0.0001 ~ 0.0203).</p><p><strong>Conclusions: </strong>In summary, our study represented the KLF6 gene has significant impacts on milk yield and composition traits in dairy cattle. Among the identified SNPs, 7 were implicated in modulating milk traits by impacting transcriptional activity, 4 by altering mRNA secondary structure, and 1 by affecting the protein secondary structure of KLF6. These findings provided valuable molecular insights for genomic selection program of dairy cattle.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"24 1","pages":"72"},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10685595/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138453258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2023-11-21DOI: 10.1186/s12863-023-01169-8
Antonio R Romeu
{"title":"Probable human origin of the SARS-CoV-2 polybasic furin cleavage motif.","authors":"Antonio R Romeu","doi":"10.1186/s12863-023-01169-8","DOIUrl":"10.1186/s12863-023-01169-8","url":null,"abstract":"<p><strong>Background: </strong>The key evolutionary step leading to the pandemic virus was the acquisition of the PRRA furin cleavage motif at the spike glycoprotein S1/S2 junction by a progenitor of SARS-CoV-2. Two of its features draw attention: (i) it is absent in other known lineage B beta-coronaviruses, including the newly discovered coronaviruses in bats from Laos and Vietnam, which are the closest known relatives of the covid virus; and, (ii) it introduced the pair of arginine codons (CGG-CGG), whose usage is extremely rare in coronaviruses. With an occurrence rate of only 3%, the arginine CGG codon is considered a minority in SARS CoV-2. On the other hand, Laos and Vietnam bat coronaviruses contain receptor-binding domains that are almost identical to that of SARS-CoV-2 and can therefore infect human cells despite the absence of the furin cleavage motif.</p><p><strong>Results: </strong>Based on these data, the aim of this work is to provide a detailed sequence analysis between the SARS-CoV-2 S gene insert encoding PRRA and the human mRNA transcripts. The result showed a 100% match to several mRNA transcripts. The set of human genes whose mRNAs match this S gene insert are ubiquitous and highly expressed, e.g., the ATPase F1 (ATP5F1) and the ubiquitin specific peptidase 21 (USP21) genes; or specific genes of target organs or tissues of the SARS-CoV-2 infection (e.g., MEMO1, SALL3, TRIM17, CWC15, CCDC187, FAM71E2, GAB4, PRDM13). Results suggest that a recombination between the genome of a SARS-CoV-2 progenitor and human mRNA transcripts could be the origin of the S gene 12-nucleotide insert encoding the S protein PRRA motif.</p><p><strong>Conclusions: </strong>The hypothesis of probable human origin of the SARS-CoV-2 polybasic furin cleavage motif is supported by: (i) the nature of human genes whose mRNA sequence 100% match the S gene insert; (ii) the synonymous base substitution in the arginine codons (CGG-CGG); and (iii) further spike glycoprotein PRRA-like insertions suggesting that the acquisition of PRRA may not have been a single recombination event.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"24 1","pages":"71"},"PeriodicalIF":1.9,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10664542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138292550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BMC genomic dataPub Date : 2023-11-20DOI: 10.1186/s12863-023-01172-z
Didier Delourme, Laure Brémaud, Idelette Plazanet, Patrick Pélissier, Philippe Label, Nathalie Boizot, Christian Breton, Stéphanie Durand, Guy Costa
{"title":"Transcriptomic monitoring of Douglas-fir heartwood formation.","authors":"Didier Delourme, Laure Brémaud, Idelette Plazanet, Patrick Pélissier, Philippe Label, Nathalie Boizot, Christian Breton, Stéphanie Durand, Guy Costa","doi":"10.1186/s12863-023-01172-z","DOIUrl":"10.1186/s12863-023-01172-z","url":null,"abstract":"<p><strong>Objectives: </strong>Molecular cues linked to heartwood formation open new (complementary) perspectives to genetic breeding programs of Douglas-fir, a tree species largely cultivated in Europe for the natural durability and civil engineering properties of its wood.</p><p><strong>Data description: </strong>RNAs from a single genotype of Douglas-fir, extracted from three distinct wood zones (outer sapwood, inner sapwood and transition zone) at four vegetative seasons to generate an extensive RNA-seq dataset used to apprehend the in-wood dynamic and seasonality of heartwood formation in this hardwood model species. Previously published data collected on somatic embryos of the same genotype could be merged with the present dataset to upgrade grade the Douglas-fir reference transcriptome.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"24 1","pages":"69"},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662504/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138178169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}