Seyed Mohammad Amin Taheri Ghahfarokhi, Lourdes Peña-Castillo
{"title":"BacTermFinder: a comprehensive and general bacterial terminator finder using a CNN ensemble.","authors":"Seyed Mohammad Amin Taheri Ghahfarokhi, Lourdes Peña-Castillo","doi":"10.1093/nargab/lqaf016","DOIUrl":"10.1093/nargab/lqaf016","url":null,"abstract":"<p><p>A terminator is a DNA region that ends the transcription process. Currently, multiple computational tools are available for predicting bacterial terminators. However, these methods are specialized for certain bacteria or terminator type (i.e. intrinsic or factor-dependent). In this work, we developed BacTermFinder using an ensemble of convolutional neural networks (CNNs) receiving as input four different representations of terminator sequences. To develop BacTermFinder, we collected roughly 41 000 bacterial terminators (intrinsic and factor-dependent) of 22 species with varying GC-content (from 28% to 71%) from published studies that used RNA-seq technologies. We evaluated BacTermFinder's performance on terminators of five bacterial species (not used for training BacTermFinder) and two archaeal species. BacTermFinder's performance was compared with that of four other bacterial terminator prediction tools. Based on our results, BacTermFinder outperforms all other four approaches in terms of average recall without increasing the number of false positives. Moreover, BacTermFinder identifies both types of terminators (intrinsic and factor-dependent) and generalizes to archaeal terminators. Additionally, we visualized the saliency map of the CNNs to gain insights on terminator motif per species. BacTermFinder is publicly available at https://github.com/BioinformaticsLabAtMUN/BacTermFinder.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf016"},"PeriodicalIF":4.0,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890068/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143587415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Chen, Ward De Spiegelaere, Wim Trypsteen, Jo Vandesompele, Gertjan Wils, David Gleerup, Antoon Lievens, Olivier Thas, Matthijs Vynck
{"title":"Polytect: an automatic clustering and labeling method for multicolor digital PCR data.","authors":"Yao Chen, Ward De Spiegelaere, Wim Trypsteen, Jo Vandesompele, Gertjan Wils, David Gleerup, Antoon Lievens, Olivier Thas, Matthijs Vynck","doi":"10.1093/nargab/lqaf015","DOIUrl":"10.1093/nargab/lqaf015","url":null,"abstract":"<p><p>Digital polymerase chain reaction (dPCR) is a state-of-the-art targeted quantification method of nucleic acids. The technology is based on massive partitioning of a reaction mixture into individual PCR reactions. The resulting partition-level end-point fluorescence intensities are used to classify partitions as positive or negative, i.e. containing or not containing the target nucleic acid(s). Many automatic dPCR partition classification methods have been proposed, but they are limited to the analysis of single- or dual-color dPCR data. While general-purpose or flow cytometry clustering methods can be directly applied to multicolor dPCR data, these methods do not exploit the approximate prior knowledge on cluster center locations available in dPCR data. We present Polytect, a method that relies on crude cluster results from flowPeaks, previously shown to offer good partition classification performance, and subsequently refines flowPeaks' results by automatic cluster merging and cluster labeling, exploiting the prior knowledge on cluster center locations. Comparative analyses with established methods such as flowPeaks, dpcp, and ddPCRclust reveal that Polytect often surpasses established methods, both on empirical and simulated data. Polytect manages to merge excess clusters, while also successfully identifying empty clusters when fewer than the maximally observable number of clusters are present. On par with recent developments in instruments, Polytect extends beyond two-color data. The method is available as an R package and R Shiny app (https://digpcr.shinyapps.io/Polytect/).</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf015"},"PeriodicalIF":4.0,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143587420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Kück, Mark Wilkinson, Juliane Romahn, Nathan I Seidel, Karen Meusemann, Johann W Wägele
{"title":"Unraveling myriapod evolution: sealion, a novel quartet-based approach for evaluating phylogenetic uncertainty.","authors":"Patrick Kück, Mark Wilkinson, Juliane Romahn, Nathan I Seidel, Karen Meusemann, Johann W Wägele","doi":"10.1093/nargab/lqaf018","DOIUrl":"10.1093/nargab/lqaf018","url":null,"abstract":"<p><p>Myriapods, a diverse group of terrestrial arthropods, comprise four main subgroups: Chilopoda (centipedes), Diplopoda (millipedes), Pauropoda, and Symphyla. Recent phylogenomic studies affirm Myriapoda's monophyly and the monophyletic status of each subgroup but differ in their relationships. To investigate these relationships further, we reanalyzed a transcriptomic dataset of 59 species across 292 single-copy protein-coding genes. Departing from conventional methods, we employed a novel approach that relies on information from polarized quartets (i.e., sets of four orthologous sequences, with one being an outgroup) to evaluate molecular phylogenies. This Hennigian analysis reduces misleading phylogenetic signals in molecular data caused by convergence, plesiomorphy, and rate heterogeneity across sites and across lineages. Our findings reveal that some species, especially those with long root-to-tip distances, disproportionately contribute misleading signals. Analyses using conventional likelihood-based phylogenetic methods suggest that Chilopoda and Diplopoda are sister taxa. By contrast, analyses incorporating novel filters designed to minimize conflict among phylogenetically confounding signals support the monophyly of Progoneata, aligning with morphological evidence. Simulations validate the reliability of our approach, demonstrating its potential to resolve myriapod evolutionary relationships and highlight uncertainty.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf018"},"PeriodicalIF":4.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11886814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143587423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cell type-dependent directional transcription at enhancers.","authors":"Saumya Agrawal, Emi Kanamaru, Yoriko Saito, Fumihiko Ishikawa, Michiel de Hoon","doi":"10.1093/nargab/lqaf007","DOIUrl":"10.1093/nargab/lqaf007","url":null,"abstract":"<p><p>Enhancers are noncoding regulatory regions in the genome that play essential roles in modulating gene expression. Previous work showed that enhancers are not transcriptionally silent but are characterized by bidirectional expression of short capped noncoding RNAs. Balanced bidirectional expression has therefore been used as a key feature for the detection of enhancers from transcriptome data. Instead, by analyzing FANTOM5 and other deep cap analysis gene expression transcriptome datasets, we find enhancer transcription preferentially in one direction in individual cell types. As the preferred direction of transcription of an enhancer can switch between cell types, balanced bidirectional enhancer expression may appear if transcriptome data are aggregated over cell types. 5' single-cell RNA sequencing data showed that enhancers were almost exclusively expressed unidirectionally in a single cell. Reporter assay data demonstrated that the regulatory function of an enhancer does not depend on its preference for unidirectional or bidirectional expression. We conclude that requiring balanced bidirectional transcription for enhancer detection may discard most valid enhancers when applied to transcriptome data of a single cell type.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf007"},"PeriodicalIF":4.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11886823/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143587418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The zebrafish (<i>Danio rerio</i>) snoRNAome.","authors":"Renáta Hamar, Máté Varga","doi":"10.1093/nargab/lqaf013","DOIUrl":"10.1093/nargab/lqaf013","url":null,"abstract":"<p><p>Small nucleolar RNAs (snoRNAs) are one of the most abundant and evolutionary ancient group of functional non-coding RNAs. They were originally described as guides of post-transcriptional rRNA modifications, but emerging evidence suggests that snoRNAs fulfil an impressive variety of cellular functions. To reveal the true complexity of snoRNA-dependent functions, we need to catalogue first the complete repertoire of snoRNAs in a given cellular context. While the systematic mapping and characterization of \"snoRNAomes\" for some species have been described recently, this has not been done hitherto for the zebrafish (<i>Danio rerio</i>). Using size-fractionated RNA sequencing data from adult zebrafish tissues, we created an interactive \"snoRNAome\" database for this species. Our custom-designed analysis pipeline allowed us to identify with high-confidence 67 previously unannotated snoRNAs in the zebrafish genome, resulting in the most complete set of snoRNAs to date in this species. Reanalyzing multiple previously published datasets, we also provide evidence for the dynamic expression of some snoRNAs during the early stages of zebrafish development and tissue-specific expression patterns for others in adults. To facilitate further investigations into the functions of snoRNAs in zebrafish, we created a novel interactive database, snoDanio, which can be used to explore small RNA expression from transcriptomic data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf013"},"PeriodicalIF":4.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11880993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143567606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dustin J Sokolowski, Mingjie Mai, Arnav Verma, Gabriela Morgenshtern, Vallijah Subasri, Hareem Naveed, Maria Yampolsky, Michael D Wilson, Anna Goldenberg, Lauren Erdman
{"title":"iModEst: disentangling -omic impacts on gene expression variation across genes and tissues.","authors":"Dustin J Sokolowski, Mingjie Mai, Arnav Verma, Gabriela Morgenshtern, Vallijah Subasri, Hareem Naveed, Maria Yampolsky, Michael D Wilson, Anna Goldenberg, Lauren Erdman","doi":"10.1093/nargab/lqaf011","DOIUrl":"10.1093/nargab/lqaf011","url":null,"abstract":"<p><p>Many regulatory factors impact the expression of individual genes including, but not limited, to microRNA, long non-coding RNA (lncRNA), transcription factors (TFs), <i>cis-</i>methylation, copy number variation (CNV), and single-nucleotide polymorphisms (SNPs). While each mechanism can influence gene expression substantially, the relative importance of each mechanism at the level of individual genes and tissues is poorly understood. Here, we present the integrative Models of Estimated gene expression (iModEst), which details the relative contribution of different regulators to the gene expression of 16,000 genes and 21 tissues within The Cancer Genome Atlas (TCGA). Specifically, we derive predictive models of gene expression using tumour data and test their predictive accuracy in cancerous and tumour-adjacent tissues. Our models can explain up to 70% of the variance in gene expression across 43% of the genes within both tumour and tumour-adjacent tissues. We confirm that TF expression best predicts gene expression in both tumour and tumour-adjacent tissue whereas methylation predictive models in tumour tissues does not transfer well to tumour adjacent tissues. We find new patterns and recapitulate previously reported relationships between regulator and gene-expression, such as CNV-predicted <i>FGFR2</i> expression and SNP-predicted <i>TP63</i> expression. Together, iModEst offers an interactive, comprehensive atlas of individual regulator-gene-tissue expression relationships as well as relationships between regulators.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf011"},"PeriodicalIF":4.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikol Chantzi, Candace S Y Chan, Michail Patsakis, Akshatha Nayak, Austin Montgomery, Ioannis Mouratidis, Ilias Georgakopoulos-Soares
{"title":"Ribosomal DNA arrays are the most H-DNA rich element in the human genome.","authors":"Nikol Chantzi, Candace S Y Chan, Michail Patsakis, Akshatha Nayak, Austin Montgomery, Ioannis Mouratidis, Ilias Georgakopoulos-Soares","doi":"10.1093/nargab/lqaf012","DOIUrl":"10.1093/nargab/lqaf012","url":null,"abstract":"<p><p>Repetitive DNA sequences can form noncanonical structures such as H-DNA. The new telomere-to-telomere genome assembly for the human genome has eliminated gaps, enabling examination of highly repetitive regions including centromeric and pericentromeric repeats and ribosomal DNA arrays. We find that H-DNA appears once every 25 000 base pairs in the human genome. Its distribution is highly inhomogeneous with H-DNA motif hotspots being detectable in acrocentric chromosomes. Ribosomal DNA arrays are the genomic element with a 40.94-fold H-DNA enrichment. Across acrocentric chromosomes, we report that 54.82% of H-DNA motifs found in these chromosomes are in rDNA array loci. We discover that binding sites for the PRDM9-B allele, a variant of the PRDM9 protein, are enriched for H-DNA motifs. We further investigate these findings through an analysis of PRDM-9 ChIP-seq data across various PRDM-9 alleles, observing an enrichment of H-DNA motifs in the binding sites of A-like alleles (including A, B, and N alleles), but not C-like alleles (including C and L4 alleles). The enrichment of H-DNA motifs at ribosomal DNA arrays is consistent in nonhuman great ape genomes. We conclude that ribosomal DNA arrays are the most enriched genomic loci for H-DNA sequences in human and other great ape genomes.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf012"},"PeriodicalIF":4.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kamil Steczkiewicz, Aleksander Kossakowski, Stanisław Janik, Anna Muszewska
{"title":"Low-complexity regions in fungi display functional groups and are depleted in positively charged amino acids.","authors":"Kamil Steczkiewicz, Aleksander Kossakowski, Stanisław Janik, Anna Muszewska","doi":"10.1093/nargab/lqaf014","DOIUrl":"10.1093/nargab/lqaf014","url":null,"abstract":"<p><p>Reports on the diversity and occurrence of low-complexity regions (LCR) in Eukaryota are limited. Some studies have provided a more extensive characterization of LCR proteins in prokaryotes. There is a growing body of knowledge about a plethora of biological functions attributable to LCRs. However, it is hard to determine to what extent observed phenomena apply to fungi since most studies of fungal LCRs were limited to model yeasts. To fill this gap, we performed a survey of LCRs in proteins across all fungal tree of life branches. We show that the abundance of LCRs and the abundance of proteins with LCRs are positively correlated with proteome size. We observed that most LCRs are present in proteins with protein domains but do not overlap with the domain regions. LCRs are associated with many duplicated protein domains. The quantity of particular amino acids in LCRs deviates from the background frequency with a clear over-representation of amino acids with functional groups and a negative charge. Moreover, we discovered that each lineage of fungi favors distinct LCRs expansions. Early diverging fungal lineages differ in LCR abundance and composition pointing at a different evolutionary trajectory of each fungal group.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf014"},"PeriodicalIF":4.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxwell L Neal, Sanjeev K Choudhry, John D Aitchison
{"title":"DeleteomeTools: utilizing a compendium of yeast deletion strain transcriptomes to identify co-functional genes.","authors":"Maxwell L Neal, Sanjeev K Choudhry, John D Aitchison","doi":"10.1093/nargab/lqaf008","DOIUrl":"10.1093/nargab/lqaf008","url":null,"abstract":"<p><p>We introduce DeleteomeTools, an R package that leverages the Deleteome compendium of yeast single-gene deletion transcriptomes to predict gene function. Primarily, the package provides functions for identifying similarities between the transcriptomic signatures of deletion strains, thereby associating genes of interest with others that may be functionally related. We describe how our software predicted a novel relationship between the yeast nucleoporin Nup170 and the Ctf18-RFC complex, which was confirmed experimentally, revealing a previously unknown link between nuclear pore complexes and the DNA replication machinery. To assess the package's broader predictive capabilities, we performed a systematic evaluation that tested how well it predicted Gene Ontology (GO) annotations already applied to the subset of genes deleted in Deleteome strains. We show that our package predicted a majority of reported GO:<i>biological process</i> annotations with semantic similarities ranging from moderate to identical. We also discuss how our strategy for quantifying similarity between deletion strains, which relies on differential expression signatures, differs from other approaches that use global expression profiles, and why it has the potential to identify functional relationships that might otherwise go undetected.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf008"},"PeriodicalIF":4.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878635/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniil A Khlebnikov, Arina I Nikolskaya, Anastasia A Zharikova, Andrey A Mironov
{"title":"Comprehensive analysis of RNA-chromatin, RNA-, and DNA-protein interactions.","authors":"Daniil A Khlebnikov, Arina I Nikolskaya, Anastasia A Zharikova, Andrey A Mironov","doi":"10.1093/nargab/lqaf010","DOIUrl":"10.1093/nargab/lqaf010","url":null,"abstract":"<p><p>RNA-chromatin interactome data are considered to be one of the noisiest types of data in biology. This is due to protein-coding RNA contacts and nonspecific interactions between RNA and chromatin caused by protocol specifics. Therefore, finding regulatory interactions between certain transcripts and genome loci requires a wide range of filtering techniques to obtain significant results. Using data on pairwise interactions between these molecules, we propose a concept of triad interaction involving RNA, protein, and a DNA locus. The constructed triads show significantly less noise contacts and are more significant when compared to a background model for generating pairwise interactions. RNA-chromatin contacts data can be used to validate the proposed triad object as positive (Red-ChIP experiment) or negative (RADICL-seq NPM) controls. Our approach also filters RNA-chromatin contacts in chromatin regions associated with protein functions based on ChromHMM annotation.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqaf010"},"PeriodicalIF":4.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11850300/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143504711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}