Veniamin Fishman, Yuri Kuratov, Aleksei Shmelev, Maxim Petrov, Dmitry Penzar, Denis Shepelin, Nikolay Chekanov, Olga Kardymon, Mikhail Burtsev
{"title":"GENA-LM: a family of open-source foundational DNA language models for long sequences","authors":"Veniamin Fishman, Yuri Kuratov, Aleksei Shmelev, Maxim Petrov, Dmitry Penzar, Denis Shepelin, Nikolay Chekanov, Olga Kardymon, Mikhail Burtsev","doi":"10.1093/nar/gkae1310","DOIUrl":"https://doi.org/10.1093/nar/gkae1310","url":null,"abstract":"Recent advancements in genomics, propelled by artificial intelligence, have unlocked unprecedented capabilities in interpreting genomic sequences, mitigating the need for exhaustive experimental analysis of complex, intertwined molecular processes inherent in DNA function. A significant challenge, however, resides in accurately decoding genomic sequences, which inherently involves comprehending rich contextual information dispersed across thousands of nucleotides. To address this need, we introduce GENA language model (GENA-LM), a suite of transformer-based foundational DNA language models capable of handling input lengths up to 36 000 base pairs. Notably, integrating the newly developed recurrent memory mechanism allows these models to process even larger DNA segments. We provide pre-trained versions of GENA-LM, including multispecies and taxon-specific models, demonstrating their capability for fine-tuning and addressing a spectrum of complex biological tasks with modest computational demands. While language models have already achieved significant breakthroughs in protein biology, GENA-LM showcases a similarly promising potential for reshaping the landscape of genomics and multi-omics data analysis. All models are publicly available on GitHub (https://github.com/AIRI-Institute/GENA_LM) and on HuggingFace (https://huggingface.co/AIRI-Institute). In addition, we provide a web service (https://dnalm.airi.net/) allowing user-friendly DNA annotation with GENA-LM models.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"1 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142986732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A trigger-inducible split-Csy4 architecture for programmable RNA modulation","authors":"Lihang Zhang, Xinyuan Qiu, Yuting Zhou, Zhengyang Luo, Lingyun Zhu, Jiawei Shao, Mingqi Xie, Hui Wang","doi":"10.1093/nar/gkae1319","DOIUrl":"https://doi.org/10.1093/nar/gkae1319","url":null,"abstract":"The CRISPR-derived endoribonuclease Csy4 is a popular tool for controlling transgene expression in various therapeutically relevant settings, but adverse effects potentially arising from non-specific RNA cleavage remains largely unexplored. Here, we report a split-Csy4 architecture that was carefully optimized for in vivo usage. First, we separated Csy4 into two independent protein moieties whose full catalytic activity can be restored via various constitutive or conditional protein dimerization systems. Next, we show that introduction of split-Csy4 into human cells caused a substantially reduced extent in perturbation of the endogenous transcriptome when directly compared to full-length Csy4. Inspired by these results, we went on to use such split-Csy4 module to engineer inducible CRISPR- and translation-level gene switches regulated by the FDA-approved drug grazoprevir. This work provides valuable resource for Csy4-related biomedical research and discusses important issues for the development of clinically eligible regulation tools.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"23 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142986733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cryo-EM structure of human TUT1:U6 snRNA complex","authors":"Seisuke Yamashita, Kozo Tomita","doi":"10.1093/nar/gkae1314","DOIUrl":"https://doi.org/10.1093/nar/gkae1314","url":null,"abstract":"U6 snRNA (small nuclear ribonucleic acid) is a ribozyme that catalyzes pre-messenger RNA (pre-mRNA) splicing and undergoes epitranscriptomic modifications. After transcription, the 3′-end of U6 snRNA is oligo-uridylylated by the multi-domain terminal uridylyltransferase (TUTase), TUT1. The 3′- oligo-uridylylated tail of U6 snRNA is crucial for U4/U6 di-snRNP (small nuclear ribonucleoprotein) formation and pre-mRNA splicing. Here, we present the cryo-electron microscopy structure of the human TUT1:U6 snRNA complex. The AUA-rich motif between the 5′-short stem-loop and the telestem of U6 snRNA is clamped by the N-terminal zinc finger (ZF)–RNA recognition motif and the catalytic Palm of TUT1, and the telestem is gripped by the N-terminal ZF and the Fingers, positioning the 3′-end of the telestem in the catalytic pocket. The internal stem-loop in the 3′-stem-loop of U6 snRNA is anchored by the C-terminal kinase-associated 1 domain, preventing U6 snRNA from dislodging on the TUT1 surface during oligo-uridylylation. TUT1 recognizes the sequence and structural features of U6 snRNA, and holds the entire U6 snRNA body using multiple domains to ensure oligo-uridylylation. This highlights the specificity of TUT1 as a U6 snRNA-targeting TUTase.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"42 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142986734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyrielle Petibon, Mathieu Catala, Danna Morales, Shanker Shyam Panchapakesan, Peter J Unrau, Sherif Abou Elela
{"title":"Transcription factors induce differential splicing of duplicated ribosomal protein genes during meiosis","authors":"Cyrielle Petibon, Mathieu Catala, Danna Morales, Shanker Shyam Panchapakesan, Peter J Unrau, Sherif Abou Elela","doi":"10.1093/nar/gkae1321","DOIUrl":"https://doi.org/10.1093/nar/gkae1321","url":null,"abstract":"In baker’s yeast, genes encoding ribosomal proteins often exist as duplicate pairs, typically with one ‘major’ paralog highly expressed and a ‘minor’ less expressed paralog that undergoes controlled expression through reduced splicing efficiency. In this study, we investigate the regulatory mechanisms controlling splicing of the minor paralog of the uS4 protein gene (RPS9A), demonstrating that its splicing is repressed during vegetative growth but upregulated during meiosis. This differential splicing of RPS9A is mediated by two transcription factors, Rim101 and Taf14. Deletion of either RIM101 or TAF14 not only induces the splicing and expression of RPS9A with little effect on the major paralog RPS9B, but also differentially alters the splicing of reporter constructs containing only the RPS9 introns. Both Rim101 and Taf14 co-immunoprecipitate with the chromatin and RNA of the RPS9 genes, indicating that these transcription factors may affect splicing co-transcriptionally. Deletion of the RPS9A intron, RIM101 or TAF14 dysregulates RPS9A expression, impairing the timely expression of RPS9 during meiosis. Complete deletion of RPS9A impairs the expression pattern of meiotic genes and inhibits sporulation in yeast. These findings suggest a regulatory strategy whereby transcription factors modulate the splicing of duplicated ribosomal protein genes to fine-tune their expression in different cellular states.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"7 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142986731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GDBr: genomic signature interpretation tool for DNA double-strand break repair mechanisms","authors":"Hyunwoo Ryu, Hyunho Han, Chuna Kim, Jun Kim","doi":"10.1093/nar/gkae1295","DOIUrl":"https://doi.org/10.1093/nar/gkae1295","url":null,"abstract":"Large genetic variants can be generated via homologous recombination (HR), such as polymerase theta-mediated end joining (TMEJ) or single-strand annealing (SSA). Given that these HR-based mechanisms leave specific genomic signatures, we developed GDBr, a genomic signature interpretation tool for DNA double-strand break repair mechanisms using high-quality genome assemblies. We applied GDBr to a draft human pangenome reference. We found that 78.1% of non-repetitive insertions and deletions and 11.0% of non-repetitive complex substitutions contained specific signatures. Of these, we interpreted that 98.7% and 1.3% of the insertions and deletions were generated via TMEJ and SSA, respectively, and all complex substitutions via TMEJ. Since population-level pangenome datasets are being dramatically accumulated, GDBr can provide mechanistic insights into how variants are formed. GDBr is available on GitHub at https://github.com/Chemical118/GDBr.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"7 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Li, Yaokang Wu, Xianhao Xu, Yanfeng Liu, Jianghua Li, Guocheng Du, Xueqin Lv, Yangyang Li, Long Liu
{"title":"A cross-species inducible system for enhanced protein expression and multiplexed metabolic pathway fine-tuning in bacteria.","authors":"Yang Li, Yaokang Wu, Xianhao Xu, Yanfeng Liu, Jianghua Li, Guocheng Du, Xueqin Lv, Yangyang Li, Long Liu","doi":"10.1093/nar/gkae1315","DOIUrl":"10.1093/nar/gkae1315","url":null,"abstract":"<p><p>Inducible systems are crucial to metabolic engineering and synthetic biology, enabling organisms that function as biosensors and produce valuable compounds. However, almost all inducible systems are strain-specific, limiting comparative analyses and applications across strains rapidly. This study designed and presented a robust workflow for developing the cross-species inducible system. By applying this approach, two reconstructed inducible systems (a 2,4-diacetylphloroglucinol-inducible system PphlF3R1 and an anhydrotetracycline-inducible system Ptet2R2*) were successfully developed and demonstrated to function in three model microorganisms, including Escherichia coli, Bacillus subtilis and Corynebacterium glutamicum. To enhance their practicality, both inducible systems were subsequently placed on the plasmid and genome for detailed characterization to determine the optimal expression conditions. Furthermore, the more efficient inducible system Ptet2R2* was employed to express various reporter proteins and gene clusters in these three strains. Moreover, the aTc-inducible system Ptet2R2*, combined with T7 RNA polymerase and dCas12a, was utilized to develop a single-input genetic circuit that enables the simultaneous activation and repression of gene expression. Overall, the cross-species inducible system serves as a stringent, controllable and effective tool for protein expression and metabolic pathway control in different bacteria.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 2","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724366/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142965773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heidar J Koning, Jia Y Lai, Andrew C Marshall, Elke Stroeher, Gavin Monahan, Anuradha Pullakhandam, Gavin J Knott, Timothy M Ryan, Archa H Fox, Andrew Whitten, Mihwa Lee, Charles S Bond
{"title":"Structural plasticity of the coiled-coil interactions in human SFPQ.","authors":"Heidar J Koning, Jia Y Lai, Andrew C Marshall, Elke Stroeher, Gavin Monahan, Anuradha Pullakhandam, Gavin J Knott, Timothy M Ryan, Archa H Fox, Andrew Whitten, Mihwa Lee, Charles S Bond","doi":"10.1093/nar/gkae1198","DOIUrl":"10.1093/nar/gkae1198","url":null,"abstract":"<p><p>The proteins SFPQ (splicing Factor Proline/Glutamine rich) and NONO (non-POU domain-containing octamer-binding protein) are mammalian members of the Drosophila Behaviour/Human Splicing (DBHS) protein family, which share 76% sequence identity in their conserved 320 amino acid DBHS domain. SFPQ and NONO are involved in all steps of post-transcriptional regulation and are primarily located in mammalian paraspeckles: liquid phase-separated, ribonucleoprotein sub-nuclear bodies templated by NEAT1 long non-coding RNA. A combination of structured and low-complexity regions provide polyvalent interaction interfaces that facilitate homo- and heterodimerisation, polymerisation, interactions with oligonucleotides, mRNA, long non-coding RNA, and liquid phase-separation, all of which have been implicated in cellular homeostasis and neurological diseases including neuroblastoma. The strength and competition of these interaction modes define the ability of DBHS proteins to dissociate from paraspeckles to fulfil functional roles throughout the nucleus or the cytoplasm. In this study, we define and dissect the coiled-coil interactions which promote the polymerisation of DBHS proteins, using a crystal structure of an SFPQ/NONO heterodimer which reveals a flexible coiled-coil interaction interface which differs from previous studies. We support this through extensive solution small-angle X-ray scattering experiments using a panel of SFPQ/NONO heterodimer variants which are capable of tetramerisation to varying extents. The QM mutant displayed a negligible amount of tetramerisation (quadruple loss of function coiled-coil mutant L535A/L539A/L546A/M549A), the Charged Single Alpha Helix (ΔCSAH) variant displayed a dimer-tetramer equilibrium interaction, and the disulfide-forming variant displayed constitutive tetramerisation (R542C which mimics the pathological Drosophila nonAdiss allele). We demonstrate that newly characterised coiled-coil interfaces play a role in the polymerisation of DBHS proteins in addition to the previously described canonical coiled-coil interface. The detail of these interactions provides insight into a process critical for the assembly of paraspeckles as well as the behaviour of SFPQ as a transcription factor, and general multipurpose auxiliary protein with functions essential to mammalian life. Our understanding of the coiled coil behaviour of SFPQ also enhances the explanatory power of mutations (often disease-associated) observed in the DBHS family, potentially allowing for the development of future medical options such as targeted gene therapy.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142854860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinhong Dong, Kizhakke Mattada Sathyan, Thomas G Scott, Rudradeep Mukherjee, Michael J Guertin
{"title":"ZNF143 binds DNA and stimulates transcription initiation to activate and repress direct target genes.","authors":"Jinhong Dong, Kizhakke Mattada Sathyan, Thomas G Scott, Rudradeep Mukherjee, Michael J Guertin","doi":"10.1093/nar/gkae1182","DOIUrl":"10.1093/nar/gkae1182","url":null,"abstract":"<p><p>Transcription factors bind to sequence motifs and act as activators or repressors. Transcription factors interface with a constellation of accessory cofactors to regulate distinct mechanistic steps to regulate transcription. We rapidly degraded the essential and pervasively expressed transcription factor ZNF143 to determine its function in the transcription cycle. ZNF143 facilitates RNA polymerase initiation and activates gene expression. ZNF143 binds the promoter of nearly all its activated target genes. ZNF143 also binds near the site of genic transcription initiation to directly repress a subset of genes. Although ZNF143 stimulates initiation at ZNF143-repressed genes (i.e. those that increase transcription upon ZNF143 depletion), the molecular context of binding leads to cis repression. ZNF143 competes with other more efficient activators for promoter access, physically occludes transcription initiation sites and promoter-proximal sequence elements, and acts as a molecular roadblock to RNA polymerases during early elongation. The term context specific is often invoked to describe transcription factors that have both activation and repression functions. We define the context and molecular mechanisms of ZNF143-mediated cis activation and repression.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":" ","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754675/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142829404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunshun Chen, Lizhong Chen, Aaron T L Lun, Pedro L Baldoni, Gordon K Smyth
{"title":"edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets.","authors":"Yunshun Chen, Lizhong Chen, Aaron T L Lun, Pedro L Baldoni, Gordon K Smyth","doi":"10.1093/nar/gkaf018","DOIUrl":"10.1093/nar/gkaf018","url":null,"abstract":"<p><p>edgeR is an R/Bioconductor software package for differential analyses of sequencing data in the form of read counts for genes or genomic features. Over the past 15 years, edgeR has been a popular choice for statistical analysis of data from sequencing technologies such as RNA-seq or ChIP-seq. edgeR pioneered the use of the negative binomial distribution to model read count data with replicates and the use of generalized linear models to analyze complex experimental designs. edgeR implements empirical Bayes moderation methods to allow reliable inference when the number of replicates is small. This article announces edgeR version 4, which includes new developments across a range of application areas. Infrastructure improvements include support for fractional counts, implementation of model fitting in C and a new statistical treatment of the quasi-likelihood pipeline that improves accuracy for small counts. The revised package has new functionality for differential methylation analysis, differential transcript expression, differential transcript and exon usage, testing relative to a fold-change threshold and pathway analysis. This article reviews the statistical framework and computational implementation of edgeR, briefly summarizing all the existing features and functionalities but with special attention to new features and those that have not been described previously.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 2","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed H Hassan, Matyas Pinkas, Chiaki Yaeshima, Sonoko Ishino, Toshio Uchiumi, Kosuke Ito, Gabriel Demo
{"title":"Novel archaeal ribosome dimerization factor facilitating unique 30S–30S dimerization","authors":"Ahmed H Hassan, Matyas Pinkas, Chiaki Yaeshima, Sonoko Ishino, Toshio Uchiumi, Kosuke Ito, Gabriel Demo","doi":"10.1093/nar/gkae1324","DOIUrl":"https://doi.org/10.1093/nar/gkae1324","url":null,"abstract":"Protein synthesis (translation) consumes a substantial proportion of cellular resources, prompting specialized mechanisms to reduce translation under adverse conditions. Ribosome inactivation often involves ribosome-interacting proteins. In both bacteria and eukaryotes, various ribosome-interacting proteins facilitate ribosome dimerization or hibernation, and/or prevent ribosomal subunits from associating, enabling the organisms to adapt to stress. Despite extensive studies on bacteria and eukaryotes, understanding factor-mediated ribosome dimerization or anti-association in archaea remains elusive. Here, we present cryo-electron microscopy structures of an archaeal 30S dimer complexed with an archaeal ribosome dimerization factor (designated aRDF), from Pyrococcus furiosus, resolved at a resolution of 3.2 Å. The complex features two 30S subunits stabilized by aRDF homodimers in a unique head-to-body architecture, which differs from the disome architecture observed during hibernation in bacteria and eukaryotes. aRDF interacts directly with eS32 ribosomal protein, which is essential for subunit association. The binding mode of aRDF elucidates its anti-association properties, which prevent the assembly of archaeal 70S ribosomes.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"26 1","pages":""},"PeriodicalIF":14.9,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}