Lorenzo Colombini, Francesco Santoro, Mariana Tirziu, Anna Maria Cuppone, Gianni Pozzi, Francesco Iannelli
{"title":"A 69.9-kb long inverted repeat increases genome instability in a strain of <i>Lactobacillus crispatus</i>.","authors":"Lorenzo Colombini, Francesco Santoro, Mariana Tirziu, Anna Maria Cuppone, Gianni Pozzi, Francesco Iannelli","doi":"10.1093/nargab/lqaf085","DOIUrl":"10.1093/nargab/lqaf085","url":null,"abstract":"<p><p>Long inverted repeats (LIRs) of DNA sequences longer than 30 kb are rare in prokaryotes. Here, we identified two 69.9-kb LIRs in the genome of <i>Lactobacillus crispatus</i> M247_Siena, a derivative of strain M247. Complete genome sequence of M247_Siena was determined using Nanopore and Illumina technologies, while genome structure was analyzed using ultra-long Nanopore read mapping and polymerase chain reaction (PCR). In the parental M247 genome, there was only one copy of the 69.9-kb segment, while a 15.4-kb DNA segment was present instead of the second 69.9-kb segment copy. Both segments were delimited by the same insertion sequences (IS<i>1201</i> and IS<i>Lcr2</i>), and PCR analysis of the M247 population revealed low rates (∼1.28 per 10<sup>4</sup> chromosomes) of chromosomal rearrangements involving these regions. In contrast, the 69.9-kb LIRs in M247_Siena increased genomic instability, as evidenced by two alternative chromosomal structures detected at frequencies of 23.3% and 76.7% (∼1 out of 5 chromosomes). Comparative analysis of <i>L. crispatus</i> genomes revealed no LIRs similar to those of M247_Siena. However, long repeats of other DNA segments and chromosomal rearrangements, mostly associated with insertion sequences, were detected in 8 and 9 out of 25 <i>L. crispatus</i> genomes, respectively, highlighting genomic instability as a trait of the species.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf085"},"PeriodicalIF":4.0,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144508721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaxin Zhang, Qiqin Wu, Ying Zhou, Qingyu Cheng, Tengchuan Jin
{"title":"hDNApipe: streamlining human genome analysis and interpretation with an intuitive and user-friendly interface.","authors":"Yaxin Zhang, Qiqin Wu, Ying Zhou, Qingyu Cheng, Tengchuan Jin","doi":"10.1093/nargab/lqaf088","DOIUrl":"10.1093/nargab/lqaf088","url":null,"abstract":"<p><p>With the rapid evolution of next-generation sequencing technology, numerous tools have emerged across multiple stages in the human genome analysis, complicating the assembly of an appropriate pipeline. To address this challenge, there is a pressing need for an efficient and user-friendly tool that combines extensive features with intuitive operation to streamline the process. Here we introduced hDNApipe, a highly flexible end-to-end pipeline tool designed for the analysis and interpretation of human genomic sequencing data. It is developed using bash scripts and the Python standard graphical user interface library Tkinter, which endows it with excellent usability and accessibility. This pipeline directly obtains variants and associated information, and also optionally enables the visualization of variants and downstream analysis. hDNApipe features dual-mode operation with both the command-line interface and graphical user interface, and provides multiple parameter options that enable users to conduct customized analysis. It features an extraordinarily convenient installation process with a dedicated docker setup, eliminating the complexity of manually installing dependencies. It has been tested on a Linux server using publicly available data. Furthermore, benchmarking with other available pipelines was conducted from alignment to variant calling, demonstrating hDNApipe's outstanding performance in terms of time consumption, precision, and sensitivity.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf088"},"PeriodicalIF":4.0,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144508722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Willem T K Maassen, Lennart F Johansson, Bart Charbon, Dennis Hendriksen, Sander van den Hoek, Mariska K Slofstra, Renée Mulder, Martine T Meems-Veldhuis, Robert Sietsma, Henny H Lemmink, Cleo C van Diemen, Mariëlle E van Gijn, Morris A Swertz, Kasper J van der Velde
{"title":"MOLGENIS VIP: an end-to-end DNA variant interpretation pipeline for research and diagnostics configurable to support rapid implementation of new methods.","authors":"Willem T K Maassen, Lennart F Johansson, Bart Charbon, Dennis Hendriksen, Sander van den Hoek, Mariska K Slofstra, Renée Mulder, Martine T Meems-Veldhuis, Robert Sietsma, Henny H Lemmink, Cleo C van Diemen, Mariëlle E van Gijn, Morris A Swertz, Kasper J van der Velde","doi":"10.1093/nargab/lqaf087","DOIUrl":"10.1093/nargab/lqaf087","url":null,"abstract":"<p><p>Achieving high yield in genetics research and genome diagnostics is a significant challenge because it requires a combination of multiple strategies and large-scale genomic analysis using the latest methods. Existing diagnostic software infrastructures are often unable to cope with high demands for versatility and scalability. We developed MOLGENIS VIP, a flexible, scalable, high-throughput, open-source, and \"end-to-end\" pipeline to process different types of sequencing data into portable, prioritized variant lists for immediate clinical interpretation in a wide variety of scenarios. VIP supports interpretation of short- and long-read sequencing data, using best-practice annotations and classification trees without complex IT infrastructures. VIP is developed within the long-living MOLGENIS open-source project to provide sustainability and has integrated feedback from a growing international community of users. VIP has undergone genome diagnostic laboratory testing and harnesses experiences from multiple Dutch, European, Canadian, and African diagnostic and infrastructural initiatives (VKGL, EU-Solve-RD, EJP-RD, CINECA, GA4GH). We provide a step-by-step protocol for installing and using VIP. We demonstrate VIP using 25 664 previously classified variants from the VKGL, and 18 and 41 diagnosed patients from a routine diagnostics and a Solve-RD research cohort, respectively. We believe that VIP accelerates causal variant detection and innovation in genome diagnostics and research.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf087"},"PeriodicalIF":4.0,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensive profiling of integrative conjugative elements (ICEs) in Mollicutes: distinct catalysts of gene flow and genome shaping.","authors":"Zili Chai, Zhiyun Guo, Xinxin Chen, Zilong Yang, Xia Wang, Fengwei Zhang, Fuqiang Kang, Wenting Liu, Shuang Liang, Hongguang Ren, Junjie Yue, Yuan Jin","doi":"10.1093/nargab/lqaf083","DOIUrl":"10.1093/nargab/lqaf083","url":null,"abstract":"<p><p>Mollicutes, known as the simplest bacteria with streamlined genomes, were traditionally thought to evolve mainly through gene loss. Recent studies have highlighted their rapid evolutionary capabilities and genetic exchange within individual genomes; however, their evolutionary trajectory remains elusive. By comprehensive screening 1433 available Mollicutes genomes, we revealed widespread horizontal gene transfer (HGT) in 83.9% of investigated species. These genes involve type IV secretion systems and DNA integration, inferring the unique role of integrative conjugative elements (ICEs) or integrative and mobilizable elements (IMEs) as self-transmissible genetic elements. We systematically identified 263 ICEs/IMEs across most Mollicutes genera, being intact or fragmented, showing a strong correlation with HGT frequency (cor 0.573, <i>P</i> = .002). Their transfer tendency was highlighted across species sharing ecological niches, notably in livestock-associated mycoplasmas and insect-vectored spiroplasmas. ICEs/IMEs not only act as gene shuttles ferrying various phenotypic genes, but also promote increased large-scale chromosomal transfer events, shaping the host genomes profoundly. Additionally, we provided novel evidence that <i>Ureaplasma</i> ICE facilitates genetic exchange and the spread of antibiotic resistance gene <i>tet(M)</i> among other pathogens. These findings suggest that, despite the gene-loss pressure associated with the compact genomes of Mollicutes, ICEs/IMEs play a crucial role by introducing substantial genetic resources, providing essential opportunities for evolutionary adaptation.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf083"},"PeriodicalIF":4.0,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GENNUS: generative approaches for nucleotide sequences enhance mirtron classification.","authors":"Alisson Gaspar Chiquitto, Liliane Santana Oliveira, Pedro Henrique Bugatti, Priscila Tiemi Maeda Saito, Mark Basham, Roberto Tadeu Raittz, Alexandre Rossi Paschoal","doi":"10.1093/nargab/lqaf072","DOIUrl":"10.1093/nargab/lqaf072","url":null,"abstract":"<p><p>Classifying non-coding RNA (ncRNA) sequences, particularly mirtrons, is essential for elucidating gene regulation mechanisms. However, the prevalent class imbalance in ncRNA datasets presents significant challenges, often resulting in overfitting and diminished generalization in machine learning models. In this study, GENNUS (GENerative approaches for NUcleotide Sequences) is proposed, introducing novel data augmentation strategies using generative adversarial networks (GANs) and synthetic minority over-sampling technique (SMOTE) to enhance mirtron and canonical microRNA (miRNA) classification performance. Our GAN-based methods effectively generate high-quality synthetic data that capture the intricate patterns and diversity of real mirtron sequences, eliminating the need for extensive feature engineering. Through four experiments, it is demonstrated that models trained on a combination of real and GAN-generated data improve classification accuracy compared to traditional SMOTE techniques or only with real data. Our findings reveal that GANs enhance model performance and provide a richer representation of minority classes, thus improving generalization capabilities across various machine learning frameworks. This work highlights the transformative potential of synthetic data generation in addressing data limitations in genomics, offering a pathway for more effective and scalable mirtron and canonical miRNA classification methodologies. GENNUS is available at https://github.com/chiquitto/GENNUS; and https://doi.org/10.6084/m9.figshare.28207328.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf072"},"PeriodicalIF":4.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Studying relative RNA localization from nucleus to the cytosol.","authors":"Vasilis F Ntasis, Roderic Guigó","doi":"10.1093/nargab/lqaf032","DOIUrl":"10.1093/nargab/lqaf032","url":null,"abstract":"<p><p>The precise coordination of important biological processes, such as differentiation and development, relies heavily on the regulation of gene expression. In eukaryotic cells, understanding the distribution of RNA transcripts between the nucleus and cytosol is essential for gaining valuable insights into the process of protein production. The most efficient way to estimate the levels of RNA species genome-wide is through RNA sequencing (RNAseq). While RNAseq can be performed separately in the nucleus and in the cytosol, comparing transcript levels between compartments is challenging since measurements are relative to the unknown total RNA volume. Here, we show theoretically that if, in addition to nuclear and cytosolic RNAseq, whole-cell RNAseq is also performed, then accurate estimations of the localization of transcripts can be obtained. Based on this, we designed a method that estimates, first the fraction of the total RNA volume in the cytosol (nucleus), and then, this fraction for every transcript. We evaluate our methodology on simulated data and nuclear and cytosolic single-cell data available. Finally, we use our method to investigate the subcellular localization of transcripts using bulk RNAseq data from the ENCODE project.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf032"},"PeriodicalIF":4.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204760/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giulia Cesaro, Giacomo Baruzzo, Gaia Tussardi, Barbara Di Camillo
{"title":"Differential cellular communication inference framework for large-scale single-cell RNA-sequencing data.","authors":"Giulia Cesaro, Giacomo Baruzzo, Gaia Tussardi, Barbara Di Camillo","doi":"10.1093/nargab/lqaf084","DOIUrl":"10.1093/nargab/lqaf084","url":null,"abstract":"<p><p>Single-cell transcriptomics data have been widely used to characterize biological systems, particularly in studying cell-cell communication, which plays a significant role in many biological processes. Despite the availability of various computational tools for inferring cellular communication, quantifying variations across different experimental conditions at both intercellular and intracellular levels remains challenging. Moreover, available methods are in general limited in terms of flexibility in analyzing different experimental designs and the ability to visualize results in an easily interpretable way. Here, we present a generalizable computational framework designed to infer and support differential cellular communication analysis across two experimental conditions from large-scale single-cell transcriptomics data. The scSeqCommDiff tool employs a statistical and network-based computational approach for characterizing altered cellular cross-talk in a fast and memory-efficient way. The framework is complemented with CClens, a user-friendly Shiny app to facilitate interactive analysis of inferred cell-cell communication. Validation through spatial transcriptomics data, comparison with other tools, and application to large-scale datasets (including a cell atlas) confirms the reliability, scalability, and efficiency of the framework. Moreover, the application to a single-nucleus transcriptomics dataset shows the validity and ability of the proposed workflow to support and unravel alterations in cell-cell interactions among patients with amyotrophic lateral sclerosis and healthy subjects.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf084"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning models for delineating marine microbial taxa.","authors":"Stilianos Louca","doi":"10.1093/nargab/lqaf090","DOIUrl":"10.1093/nargab/lqaf090","url":null,"abstract":"<p><p>The relationship between gene content differences and microbial taxonomic divergence remains poorly understood, and algorithms for delineating novel microbial taxa above genus level based on multiple genome similarity metrics are lacking. Addressing these gaps is important for macroevolutionary theory, biodiversity assessments, and discovery of novel taxa in metagenomes. Here, I develop machine learning classifier models, based on multiple genome similarity metrics, to determine whether any two marine bacterial and archaeal (prokaryotic) metagenome-assembled genomes (MAGs) belong to the same taxon, from the genus up to the phylum levels. Metrics include average amino acid and nucleotide identities, and fractions of shared genes within various categories, applied to 14 390 previously published non-redundant MAGs. At all taxonomic levels, the balanced accuracy (average of the true-positive and true-negative rate) of classifiers exceeded 92%, suggesting that simple genome similarity metrics serve as good taxon differentiators. Predictor selection and sensitivity analyses revealed gene categories, e.g. those involved in metabolism of cofactors and vitamins, particularly correlated to taxon divergence. Predicted taxon delineations were further used to <i>de novo</i> enumerate marine prokaryotic taxa. Statistical analyses of those enumerations suggest that over half of extant marine prokaryotic phyla, classes, and orders have already been recovered by genome-resolved metagenomic surveys.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf090"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204397/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miroslav Nuriddinov, Polina Belokopytova, Veniamin Fishman
{"title":"Charm is a flexible pipeline to simulate chromosomal rearrangements on Hi-C-like data.","authors":"Miroslav Nuriddinov, Polina Belokopytova, Veniamin Fishman","doi":"10.1093/nargab/lqaf081","DOIUrl":"10.1093/nargab/lqaf081","url":null,"abstract":"<p><p>Identifying structural variants (SVs) remains a pivotal challenge within genomic studies. The recent advent of chromosome conformation capture (3C) techniques has emerged as a promising avenue for the accurate identification of SVs. However, development and validation of computational methods leveraging 3C data necessitate comprehensive datasets of well-characterized chromosomal rearrangements, which are presently lacking. In this study, we introduce Charm (https://github.com/genomech/Charm): a robust computational framework tailored for Hi-C data simulation. Our findings demonstrate Charm's efficacy in benchmarking both novel and established tools for SV detection. Additionally, we furnish an extensive dataset of simulated Hi-C maps, paving the way for subsequent benchmarking endeavors.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf081"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kate G Daniels, Sofia Radrizzani, Laurence D Hurst
{"title":"Why AGG is associated with high transgene output: passenger effects and their implications for transgene design.","authors":"Kate G Daniels, Sofia Radrizzani, Laurence D Hurst","doi":"10.1093/nargab/lqaf086","DOIUrl":"10.1093/nargab/lqaf086","url":null,"abstract":"<p><p>In bacteria, high A and low G content of the 5' end of the coding sequence (CDS) promotes low RNA stability, facilitating ribosomal initiation and subsequently a high protein to transcript ratio. Additionally, 5' NGG codons are suppressive owing to peptidyl-tRNA drop off. It was, therefore, surprising that the first large-scale transgene experiment to interrogate the 5' effect by codon randomization found the NGG, G-rich codon AGG to be the most associated with high transgene output. Why is this? We show that this is not replicated in other large transgene datasets, where AGG and NGG are associated with low efficiency. More generally, there is limited agreement between the first experiment and others. This we find to be a consequence of non-random construct design. In constructs of the first experiment, AGG disproportionately occurs with non-AGG codons associated with low stability and high protein output, making AGG's association with high output an artefact. While translationally non-optimal codons like AGG are conjectured to slow ribosomes for orderly initiation, we find that in the less biased constructs high, not low, translational adaptation in the first 10 codons is (weakly) predictive of higher translational efficiency. These results have implications for both transgene and experimental design.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf086"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204400/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}