{"title":"Intronic RNA secondary structural information captured for the human <i>MYC</i> pre-mRNA.","authors":"Taylor O Eich, Collin A O'Leary, Walter N Moss","doi":"10.1093/nargab/lqae143","DOIUrl":"10.1093/nargab/lqae143","url":null,"abstract":"<p><p>To address the lack of intronic reads in secondary structure probing data for the human <i>MYC</i> pre-mRNA, we developed a method that combines spliceosomal inhibition with RNA probing and sequencing. Here, the SIRP-seq method was applied to study the secondary structure of human <i>MYC</i> RNAs by chemically probing HeLa cells with dimethyl sulfate in the presence of the small molecule spliceosome inhibitor pladienolide B. Pladienolide B binds to the SF3B complex of the spliceosome to inhibit intron removal during splicing, resulting in retained intronic sequences. This method was used to increase the read coverage over intronic regions of <i>MYC</i>. The purpose for increasing coverage across introns was to generate complete reactivity profiles for intronic sequences via the DMS-MaPseq approach. Notably, depth was sufficient for analysis by the program DRACO, which was able to deduce distinct reactivity profiles and predict multiple secondary structural conformations as well as their suggested stoichiometric abundances. The results presented here provide a new method for intronic RNA secondary structural analyses, as well as specific structural insights relevant to <i>MYC</i> RNA splicing regulation and therapeutic targeting.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae143"},"PeriodicalIF":4.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rick Beeloo, Aldert L Zomer, Sebastian Deorowicz, Bas E Dutilh
{"title":"Graphite: painting genomes using a colored de Bruijn graph.","authors":"Rick Beeloo, Aldert L Zomer, Sebastian Deorowicz, Bas E Dutilh","doi":"10.1093/nargab/lqae142","DOIUrl":"https://doi.org/10.1093/nargab/lqae142","url":null,"abstract":"<p><p>The recent growth of microbial sequence data allows comparisons at unprecedented scales, enabling the tracking of strains, mobile genetic elements, or genes. Querying a genome against a large reference database can easily yield thousands of matches that are tedious to interpret and pose computational challenges. We developed Graphite that uses a colored de Bruijn graph (cDBG) to paint query genomes, selecting the local best matches along the full query length. By focusing on the best genomic match of each query region, Graphite reduces the number of matches while providing the most promising leads for sequence tracking or genomic forensics. When applied to hundreds of <i>Campylobacter</i> genomes we found extensive gene sharing, including a previously undetected <i>C. coli</i> plasmid that matched a <i>C. jejuni</i> chromosome. Together, genome painting using cDBGs as enabled by Graphite, can reveal new biological phenomena by mitigating computational hurdles.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae142"},"PeriodicalIF":4.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Scala, Luigi Ferraro, Aurora Brandi, Yan Guo, Barbara Majello, Michele Ceccarelli
{"title":"MoNETA: MultiOmics Network Embedding for SubType Analysis.","authors":"Giovanni Scala, Luigi Ferraro, Aurora Brandi, Yan Guo, Barbara Majello, Michele Ceccarelli","doi":"10.1093/nargab/lqae141","DOIUrl":"10.1093/nargab/lqae141","url":null,"abstract":"<p><p>Cells are complex systems whose behavior emerges from a huge number of reactions taking place within and among different molecular districts. The availability of bulk and single-cell omics data fueled the creation of multi-omics systems biology models capturing the dynamics within and between omics layers. Powerful modeling strategies are needed to cope with the increased amount of data to be interrogated and the relative research questions. Here, we present MultiOmics Network Embedding for SubType Analysis (MoNETA) for fast and scalable identification of relevant multi-omics relationships between biological entities at the bulk and single-cells level. We apply MoNETA to show how glioma subtypes previously described naturally emerge with our approach. We also show how MoNETA can be used to identify cell types in five multi-omic single-cell datasets.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae141"},"PeriodicalIF":4.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11482636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142476446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brent T Schlegel, Michael Morikone, Fangping Mu, Wan-Yee Tang, Gary Kohanbash, Dhivyaa Rajasundaram
{"title":"bcRflow: a Nextflow pipeline for characterizing B cell receptor repertoires from non-targeted transcriptomic data.","authors":"Brent T Schlegel, Michael Morikone, Fangping Mu, Wan-Yee Tang, Gary Kohanbash, Dhivyaa Rajasundaram","doi":"10.1093/nargab/lqae137","DOIUrl":"10.1093/nargab/lqae137","url":null,"abstract":"<p><p>B cells play a critical role in the adaptive recognition of foreign antigens through diverse receptor generation. While targeted immune sequencing methods are commonly used to profile B cell receptors (BCRs), they have limitations in cost and tissue availability. Analyzing B cell receptor profiling from non-targeted transcriptomics data is a promising alternative, but a systematic pipeline integrating tools for accurate immune repertoire extraction is lacking. Here, we present bcRflow, a Nextflow pipeline designed to characterize BCR repertoires from non-targeted transcriptomics data, with functional modules for alignment, processing, and visualization. bcRflow is a comprehensive, reproducible, and scalable pipeline that can run on high-performance computing clusters, cloud-based computing resources like Amazon Web Services (AWS), the Open OnDemand framework, or even local desktops. bcRflow utilizes institutional configurations provided by nf-core to ensure maximum portability and accessibility. To demonstrate the functionality of the bcRflow pipeline, we analyzed a public dataset of bulk transcriptomic samples from COVID-19 patients and healthy controls. We have shown that bcRflow streamlines the analysis of BCR repertoires from non-targeted transcriptomics data, providing valuable insights into the B cell immune response for biological and clinical research. bcRflow is available at https://github.com/Bioinformatics-Core-at-Childrens/bcRflow.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae137"},"PeriodicalIF":4.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142476445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Antonio Tangaro, Marica Antonacci, Giacinto Donvito, Nadina Foggetti, Pietro Mandreoli, Daniele Colombo, Graziano Pesole, Federico Zambelli
{"title":"Dynamic configuration and data security for bioinformatics cloud services with the Laniakea Dashboard.","authors":"Marco Antonio Tangaro, Marica Antonacci, Giacinto Donvito, Nadina Foggetti, Pietro Mandreoli, Daniele Colombo, Graziano Pesole, Federico Zambelli","doi":"10.1093/nargab/lqae140","DOIUrl":"10.1093/nargab/lqae140","url":null,"abstract":"<p><p>Technological advances in high-throughput technologies improve our ability to explore the molecular mechanisms of life. Computational infrastructures for scientific applications fulfil a critical role in harnessing this potential. However, there is an ongoing need to improve accessibility and implement robust data security technologies to allow the processing of sensitive data, particularly human genetic data. Scientific clouds have emerged as a promising solution to meet these needs. We present three components of the Laniakea software stack, initially developed to support the provision of private on-demand Galaxy instances. These components can be adopted by providers of scientific cloud services built on the INDIGO PaaS layer. The <i>Dashboard</i> translates configuration template files into user-friendly web interfaces, enabling the easy configuration and launch of on-demand applications. The <i>secret management</i> and the <i>encryption</i> components, integrated within the Dashboard, support the secure handling of passphrases and credentials and the deployment of block-level encrypted storage volumes for managing sensitive data in the cloud environment. By adopting these software components, scientific cloud providers can develop convenient, secure and efficient on-demand services for their users.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae140"},"PeriodicalIF":4.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scATAcat: cell-type annotation for scATAC-seq data.","authors":"Aybuge Altay, Martin Vingron","doi":"10.1093/nargab/lqae135","DOIUrl":"https://doi.org/10.1093/nargab/lqae135","url":null,"abstract":"<p><p>Cells whose accessibility landscape has been profiled with scATAC-seq cannot readily be annotated to a particular cell type. In fact, annotating cell-types in scATAC-seq data is a challenging task since, unlike in scRNA-seq data, we lack knowledge of 'marker regions' which could be used for cell-type annotation. Current annotation methods typically translate accessibility to expression space and rely on gene expression patterns. We propose a novel approach, scATAcat, that leverages characterized bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To mitigate the inherent sparsity of single-cell data, we aggregate cells that belong to the same cluster and create pseudobulk. To demonstrate the feasibility of our approach we collected a number of datasets with respective annotations to quantify the results and evaluate performance for scATAcat. scATAcat is available as a python package at https://github.com/aybugealtay/scATAcat.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae135"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TrajectoryGeometry suggests cell fate decisions can involve branches rather than bifurcations.","authors":"Anna Laddach, Vassilis Pachnis, Michael Shapiro","doi":"10.1093/nargab/lqae139","DOIUrl":"10.1093/nargab/lqae139","url":null,"abstract":"<p><p>Differentiation of multipotential progenitor cells is a key process in the development of any multi-cellular organism and often continues throughout its life. It is often assumed that a bi-potential progenitor develops along a (relatively) straight trajectory until it reaches a decision point where the trajectory bifurcates. At this point one of two directions is chosen, each direction representing the unfolding of a new transcriptional programme. However, we have lacked quantitative means for testing this model. Accordingly, we have developed the R package TrajectoryGeometry. Applying this to published data we find several examples where, rather than bifurcate, developmental pathways <i>branch</i>. That is, the bipotential progenitor develops along a relatively straight trajectory leading to one of its potential fates. A second relatively straight trajectory branches off from this towards the other potential fate. In this sense only cells that branch off to follow the second trajectory make a 'decision'. Our methods give precise descriptions of the genes and cellular pathways involved in these trajectories. We speculate that branching may be the more common behaviour and may have advantages from a control-theoretic viewpoint.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae139"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han
{"title":"Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures.","authors":"Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han","doi":"10.1093/nargab/lqae136","DOIUrl":"10.1093/nargab/lqae136","url":null,"abstract":"<p><p>We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae136"},"PeriodicalIF":4.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefania Pirrotta, Laura Masatti, Anna Bortolato, Anna Corrà, Fabiola Pedrini, Martina Aere, Giovanni Esposito, Paolo Martini, Davide Risso, Chiara Romualdi, Enrica Calura
{"title":"Exploring public cancer gene expression signatures across bulk, single-cell and spatial transcriptomics data with signifinder Bioconductor package.","authors":"Stefania Pirrotta, Laura Masatti, Anna Bortolato, Anna Corrà, Fabiola Pedrini, Martina Aere, Giovanni Esposito, Paolo Martini, Davide Risso, Chiara Romualdi, Enrica Calura","doi":"10.1093/nargab/lqae138","DOIUrl":"10.1093/nargab/lqae138","url":null,"abstract":"<p><p>Understanding cancer mechanisms, defining subtypes, predicting prognosis and assessing therapy efficacy are crucial aspects of cancer research. Gene-expression signatures derived from bulk gene expression data have played a significant role in these endeavors over the past decade. However, recent advancements in high-resolution transcriptomic technologies, such as single-cell RNA sequencing and spatial transcriptomics, have revealed the complex cellular heterogeneity within tumors, necessitating the development of computational tools to characterize tumor mass heterogeneity accurately. Thus we implemented signifinder, a novel R Bioconductor package designed to streamline the collection and use of cancer transcriptional signatures across bulk, single-cell, and spatial transcriptomics data. Leveraging publicly available signatures curated by signifinder, users can assess a wide range of tumor characteristics, including hallmark processes, therapy responses, and tumor microenvironment peculiarities. Through three case studies, we demonstrate the utility of transcriptional signatures in bulk, single-cell, and spatial transcriptomic data analyses, providing insights into cell-resolution transcriptional signatures in oncology. Signifinder represents a significant advancement in cancer transcriptomic data analysis, offering a comprehensive framework for interpreting high-resolution data and addressing tumor complexity.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae138"},"PeriodicalIF":4.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yidi Deng, Jiadong Mao, Jarny Choi, Kim-Anh Lê Cao
{"title":"StableMate: a statistical method to select stable predictors in omics data.","authors":"Yidi Deng, Jiadong Mao, Jarny Choi, Kim-Anh Lê Cao","doi":"10.1093/nargab/lqae130","DOIUrl":"https://doi.org/10.1093/nargab/lqae130","url":null,"abstract":"<p><p>Identifying statistical associations between biological variables is crucial to understanding molecular mechanisms. Most association studies are based on correlation or linear regression analyses, but the identified associations often lack reproducibility and interpretability due to the complexity and variability of omics datasets, making it difficult to translate associations into meaningful biological hypotheses. We developed StableMate, a regression framework, to address these challenges through a process of variable selection across heterogeneous datasets. Given datasets from different environments, such as experimental batches, StableMate selects environment-agnostic (stable) and environment-specific predictors in predicting the response of interest. Stable predictors represent robust functional dependencies with the response, and can be used to build regression models that make generalizable predictions in unseen environments. We applied StableMate to (i) RNA sequencing data of breast cancer to discover genes that consistently predict estrogen receptor expression across disease status; (ii) metagenomics data to identify microbial signatures that show persistent association with colon cancer across study cohorts; and (iii) single-cell RNA sequencing data of glioblastoma to discern signature genes associated with the development of pro-tumour microglia regardless of cell location. Our case studies demonstrate that StableMate is adaptable to regression and classification analyses and achieves comprehensive characterization of biological systems for different omics data types.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae130"},"PeriodicalIF":4.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}