Sergey Margasyuk, Antonina Kuznetsova, Lev Zavileyskiy, Maria Vlasenok, Dmitry Skvortsov, Dmitri D Pervouchine
{"title":"Human introns contain conserved tissue-specific cryptic poison exons.","authors":"Sergey Margasyuk, Antonina Kuznetsova, Lev Zavileyskiy, Maria Vlasenok, Dmitry Skvortsov, Dmitri D Pervouchine","doi":"10.1093/nargab/lqae163","DOIUrl":"10.1093/nargab/lqae163","url":null,"abstract":"<p><p>Eukaryotic cells express a large number of transcripts from a single gene due to alternative splicing. Despite hundreds of thousands of splice isoforms being annotated in databases, it has been reported that the current exon catalogs remain incomplete. At the same time, introns of human protein-coding (PC) genes contain a large number of evolutionarily conserved elements with unknown function. Here, we explore the possibility that some of them represent cryptic exons that are expressed in rare conditions. We identified a group of cryptic exons that are similar to the annotated exons in terms of evolutionary conservation and RNA-seq read coverage in the Genotype-Tissue Expression dataset. Most of them were poison, i.e. generated an nonsense-mediated decay (NMD) isoform upon inclusion, and many showed signs of tissue-specific and cancer-specific expression and regulation. We performed RNA-seq in A549 cell line treated with cycloheximide to inactivate NMD and confirmed using quantitative polymerase chain reaction that seven of eight exons tested are, indeed, expressed. This study shows that introns of human PC genes contain cryptic poison exons, which reside in conserved intronic regions and remain not fully annotated due to insufficient representation in RNA-seq libraries.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae163"},"PeriodicalIF":4.0,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632617/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jody E Phelan, Fatima Niazi, Linfeng Wang, Gabrielle C Ngwana-Joseph, Benjamin Sobkowiak, Ted Cohen, Susana Campino, Taane G Clark
{"title":"TGV: suite of tools to visualize transmission graphs.","authors":"Jody E Phelan, Fatima Niazi, Linfeng Wang, Gabrielle C Ngwana-Joseph, Benjamin Sobkowiak, Ted Cohen, Susana Campino, Taane G Clark","doi":"10.1093/nargab/lqae158","DOIUrl":"10.1093/nargab/lqae158","url":null,"abstract":"<p><p>Graph structures are often used to visualize transmission networks generated using genomic epidemiological methods. However, tools to interactively visualize these graphs do not exist. A browser-based tool allowing users to load and interactively visualize transmission graphs was developed in JavaScript. Associated metadata can be loaded and used to annotate and filter the nodes and edges of transmission networks. The tool is available at jodyphelan.github.io/tgv.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae158"},"PeriodicalIF":4.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142807496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guy Karlebach, Peter Hansen, Kristin Köhler, Peter N Robinson
{"title":"IsopretGO-analysing and visualizing the functional consequences of differential splicing.","authors":"Guy Karlebach, Peter Hansen, Kristin Köhler, Peter N Robinson","doi":"10.1093/nargab/lqae165","DOIUrl":"10.1093/nargab/lqae165","url":null,"abstract":"<p><p>Gene Ontology overrepresentation analysis (GO-ORA) is a standard approach towards characterizing salient functional characteristics of sets of differentially expressed genes (DGE) in RNA sequencing (RNA-seq) experiments. GO-ORA compares the distribution of GO annotations of the DGE to that of all genes or all expressed genes. This approach has not been available to characterize differential alternative splicing (DAS). Here, we introduce a desktop application called isopretGO for visualizing the functional implications of DGE and DAS that leverages our previously published machine-learning predictions of GO annotations for individual isoforms. We show based on an analysis of 100 RNA-seq datasets that DAS and DGE frequently have starkly different functional profiles. We present an example that shows how isopretGO can be used to identify functional shifts in RNA-seq data that can be attributed to differential splicing.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae165"},"PeriodicalIF":4.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142807412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AnnoGCD: a generalized category discovery framework for automatic cell type annotation.","authors":"Francesco Ceccarelli, Pietro Liò, Sean B Holden","doi":"10.1093/nargab/lqae166","DOIUrl":"10.1093/nargab/lqae166","url":null,"abstract":"<p><p>The identification of cell types in single-cell RNA sequencing (scRNA-seq) data is a critical task in understanding complex biological systems. Traditional supervised machine learning methods rely on large, well-labeled datasets, which are often impractical to obtain in open-world scenarios due to budget constraints and incomplete information. To address these challenges, we propose a novel computational framework, named AnnoGCD, building on Generalized Category Discovery (GCD) and Anomaly Detection (AD) for automatic cell type annotation. Our semi-supervised method combines labeled and unlabeled data to accurately classify known cell types and to discover novel ones, even in imbalanced datasets. AnnoGCD includes a semi-supervised block to first classify known cell types, followed by an unsupervised block aimed at identifying and clustering novel cell types. We evaluated our approach on five human scRNA-seq datasets and a mouse model atlas, demonstrating superior performance in both known and novel cell type identification compared to existing methods. Our model also exhibited robustness in datasets with significant class imbalance. The results suggest that AnnoGCD is a powerful tool for the automatic annotation of cell types in scRNA-seq data, providing a scalable solution for biological research and clinical applications. Our code and the datasets used for evaluations are publicly available on GitHub: https://github.com/cecca46/AnnoGCD/.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae166"},"PeriodicalIF":4.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142807220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Hansen, Hannah Blau, Jochen Hecht, Guy Karlebach, Alexander Krannich, Robin Steinhaus, Matthias Truss, Peter N Robinson
{"title":"Using paired-end read orientations to assess technical biases in capture Hi-C.","authors":"Peter Hansen, Hannah Blau, Jochen Hecht, Guy Karlebach, Alexander Krannich, Robin Steinhaus, Matthias Truss, Peter N Robinson","doi":"10.1093/nargab/lqae156","DOIUrl":"10.1093/nargab/lqae156","url":null,"abstract":"<p><p>Hi-C and capture Hi-C (CHi-C) both leverage paired-end sequencing of chimeric fragments to gauge the strength of interactions based on the total number of paired-end reads mapped to a common pair of restriction fragments. Mapped paired-end reads can have four relative orientations, depending on the genomic positions and strands of the two reads. We assigned one paired-end read orientation to each of the four possible re-ligations that can occur between two given restriction fragments. In a large hematopoietic cell dataset, we determined the read pair counts of interactions separately for each orientation. Interactions with imbalances in the counts occur much more often than expected by chance for both Hi-C and CHi-C. Based on such imbalances, we identified target restriction fragments enriched at only one instead of both ends. By matching them to the baits used for the experiments, we confirmed our assignment of paired-end read orientations and gained insights that can inform bait design. An analysis of unbaited fragments shows that, beyond bait effects, other known types of technical biases are reflected in count imbalances. Taking advantage of distance-dependent contact frequencies, we assessed the impact of such biases. Our results have the potential to improve the design and interpretation of CHi-C experiments.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae156"},"PeriodicalIF":4.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142807630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting the pro-longevity or anti-longevity effect of model organism genes with enhanced Gaussian noise augmentation-based contrastive learning on protein-protein interaction networks.","authors":"Ibrahim Alsaggaf, Alex A Freitas, Cen Wan","doi":"10.1093/nargab/lqae153","DOIUrl":"10.1093/nargab/lqae153","url":null,"abstract":"<p><p>Ageing is a highly complex and important biological process that plays major roles in many diseases. Therefore, it is essential to better understand the molecular mechanisms of ageing-related genes. In this work, we proposed a novel enhanced Gaussian noise augmentation-based contrastive learning (EGsCL) framework to predict the pro-longevity or anti-longevity effect of four model organisms' ageing-related genes by exploiting protein-protein interaction (PPI) networks. The experimental results suggest that EGsCL successfully outperformed the conventional Gaussian noise augmentation-based contrastive learning methods and obtained state-of-the-art performance on three model organisms' predictive tasks when merely relying on PPI network data. In addition, we use EGsCL to predict 10 novel pro-/anti-longevity mouse genes and discuss the support for these predictions in the literature.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae153"},"PeriodicalIF":4.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142781309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodrigo Galindo-Murillo, Jack S Cohen, Barak Akabayov
{"title":"Comparative molecular dynamics calculations of duplexation of chemically modified analogs of DNA used for antisense applications.","authors":"Rodrigo Galindo-Murillo, Jack S Cohen, Barak Akabayov","doi":"10.1093/nargab/lqae155","DOIUrl":"10.1093/nargab/lqae155","url":null,"abstract":"<p><p>We have subjected several analogs of DNA that have been widely used as antisense oligonucleotide (ASO) inhibitors of gene expression to comparative molecular dynamics (MD) calculations of their ability to form duplexes with DNA and RNA. The analogs included in this study are the phosphorothioate (PS), peptide nucleic acid (PNA), locked nucleic acid (LNA), morpholino nucleic acid (PMO), the 2'-OMe, 2'-F, 2'-methoxyethyl (2'-MOE) and the constrained cET analogs, as well as the natural phosphodiester (PO) as control, for a total of nine structures, in both XNA-DNA and XNA-RNA duplexes. This is intended as an objective criterion for their relative ability to duplex with an RNA complement and their comparative potential for antisense applications. We have found that the constrained furanose ring analogs show increased stability when considering this study's structural and energetic parameters. The 2'-MOE modification, even though energetically stable, has an elevated dynamic range and breathing properties due to the bulkier moiety in the C2' position of the furanose. The smaller modifications in the C2' position, 2'-F, 2'-OMe and PS also form stable and energetically favored duplexes with both DNA and RNA. The morpholino moiety allows for increased tolerance in accommodating either DNA or RNA and the PNA, with the PNA being the most energetically stable, although with a preference for the B-form DNA. In summary, we can rank the overall preference of hybrid strand formations as PNA > cET/LNA > PS/2'-F/2'-OMe > morpholino > 2'-MOE for the efficacy of duplex formation.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae155"},"PeriodicalIF":4.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142781298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lara Fuhrmann, Benjamin Langer, Ivan Topolsky, Niko Beerenwinkel
{"title":"VILOCA: sequencing quality-aware viral haplotype reconstruction and mutation calling for short-read and long-read data.","authors":"Lara Fuhrmann, Benjamin Langer, Ivan Topolsky, Niko Beerenwinkel","doi":"10.1093/nargab/lqae152","DOIUrl":"10.1093/nargab/lqae152","url":null,"abstract":"<p><p>RNA viruses exist as large heterogeneous populations within their host. The structure and diversity of virus populations affects disease progression and treatment outcomes. Next-generation sequencing allows detailed viral population analysis, but inferring diversity from error-prone reads is challenging. Here, we present VILOCA (VIral LOcal haplotype reconstruction and mutation CAlling for short and long read data), a method for mutation calling and reconstruction of local haplotypes from short- and long-read viral sequencing data. Local haplotypes refer to genomic regions that have approximately the length of the input reads. VILOCA recovers local haplotypes by using a Dirichlet process mixture model to cluster reads around their unobserved haplotypes and leveraging quality scores of the sequencing reads. We assessed the performance of VILOCA in terms of mutation calling and haplotype reconstruction accuracy on simulated and experimental Illumina, PacBio and Oxford Nanopore data. On simulated and experimental Illumina data, VILOCA performed better or similar to existing methods. On the simulated long-read data, VILOCA is able to recover on average [Formula: see text] of the ground truth mutations with perfect precision compared to only [Formula: see text] recall and [Formula: see text] precision of the second-best method. In summary, VILOCA provides significantly improved accuracy in mutation and haplotype calling, especially for long-read sequencing data, and therefore facilitates the comprehensive characterization of heterogeneous within-host viral populations.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae152"},"PeriodicalIF":4.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616694/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142781326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhigang Wang, Yize Yuan, Zhe Wang, Wenjia Zhang, Chong Chen, Zhaojun Duan, Suyuan Peng, Jie Zheng, Yongqun He, Xiaolin Yang
{"title":"CancerPro: deciphering the pan-cancer prognostic landscape through combinatorial enrichment analysis and knowledge network insights.","authors":"Zhigang Wang, Yize Yuan, Zhe Wang, Wenjia Zhang, Chong Chen, Zhaojun Duan, Suyuan Peng, Jie Zheng, Yongqun He, Xiaolin Yang","doi":"10.1093/nargab/lqae157","DOIUrl":"10.1093/nargab/lqae157","url":null,"abstract":"<p><p>Gene expression levels serve as valuable markers for assessing prognosis in cancer patients. To understand the mechanisms underlying prognosis and explore potential therapeutics across diverse cancers, we developed CancerPro (https:/medcode.link/cancerpro). This knowledge network platform integrates comprehensive biomedical data on genes, drugs, diseases and pathways, along with their interactions. By integrating ontology and knowledge graph technologies, CancerPro offers a user-friendly interface for analyzing pan-cancer prognostic markers and exploring genes or drugs of interest. CancerPro implements three core functions: gene set enrichment analysis based on multiple annotations; in-depth drug analysis; and in-depth gene list analysis. Using CancerPro, we categorized genes and cancers into distinct groups and utilized network analysis to identify key biological pathways associated with unfavorable prognostic genes. The platform further pinpoints potential drug targets and explores potential links between prognostic markers and patient characteristics such as glutathione levels and obesity. For renal and prostate cancer, CancerPro identified risk genes linked to immune deficiency pathways and alternative splicing abnormalities. This research highlights CancerPro's potential as a valuable tool for researchers to explore pan-cancer prognostic markers and uncover novel therapeutic avenues. Its flexible tools support a wide range of biological investigations, making it a versatile asset in cancer research and beyond.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae157"},"PeriodicalIF":4.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142781294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoyang Zhang, Muhammad Kabir, Saeed Ahmed, Mauno Vihinen
{"title":"There will always be variants of uncertain significance. Analysis of VUSs.","authors":"Haoyang Zhang, Muhammad Kabir, Saeed Ahmed, Mauno Vihinen","doi":"10.1093/nargab/lqae154","DOIUrl":"10.1093/nargab/lqae154","url":null,"abstract":"<p><p>The ACMG/AMP guidelines include five categories of which variants of uncertain significance (VUSs) have received increasing attention. Recently, Fowler and Rehm claimed that all or most VUSs could be reclassified as pathogenic or benign within few years. To test this claim, we collected validated benign, pathogenic, VUS and conflicting variants from ClinVar and LOVD and investigated differences at gene, protein, structure, and variant levels. The gene and protein features included inheritance patterns, actionability, functional categories for housekeeping, essential, complete knockout, lethality and haploinsufficient proteins, Gene Ontology annotations, and protein network properties. Structural properties included the location at secondary structural elements, intrinsically disordered regions, transmembrane regions, repeats, conservation, and accessibility. Gene features were distributions of nucleotides, their groupings, codons, and location to CpG islands. The distributions of amino acids and their groups were investigated. VUSs did not markedly differ from other variants. The only major differences were the accessibility and conservation of pathogenic variants, and reduced ratio of repeat-locating variants in VUSs. Thus, all VUSs cannot be distinguished from other types of variants. They display one form of natural biological heterogeneity. Instead of concentrating on eradicating VUSs, the community would benefit from investigating and understanding factors that contribute to phenotypic heterogeneity.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae154"},"PeriodicalIF":4.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616676/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142781322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}