Alexander Ferrena, Xiang Yu Zheng, Kevyn Jackson, Bang Hoang, Bernice E Morrow, Deyou Zheng
{"title":"scDAPP: a comprehensive single-cell transcriptomics analysis pipeline optimized for cross-group comparison.","authors":"Alexander Ferrena, Xiang Yu Zheng, Kevyn Jackson, Bang Hoang, Bernice E Morrow, Deyou Zheng","doi":"10.1093/nargab/lqae134","DOIUrl":"10.1093/nargab/lqae134","url":null,"abstract":"<p><p>Single-cell transcriptomics profiling has increasingly been used to evaluate cross-group (or condition) differences in cell population and cell-type gene expression. This often leads to large datasets with complex experimental designs that need advanced comparative analysis. Concurrently, bioinformatics software and analytic approaches also become more diverse and constantly undergo improvement. Thus, there is an increased need for automated and standardized data processing and analysis pipelines, which should be efficient and flexible too. To address these, we develop the <b>s</b>ingle-<b>c</b>ell <b>D</b>ifferential <b>A</b>nalysis and <b>P</b>rocessing <b>P</b>ipeline (scDAPP), a R-based workflow for comparative analysis of single cell (or nucleus) transcriptomic data between two or more groups and at the levels of single cells or 'pseudobulking' samples. The pipeline automates many steps of pre-processing using data-learnt parameters, uses previously benchmarked software, and generates comprehensive intermediate data and final results that are valuable for both beginners and experts of scRNA-seq analysis. Moreover, the analytic reports, augmented by extensive data visualization, increase the transparency of computational analysis and parameter choices, while facilitate users to go seamlessly from raw data to biological interpretation. scDAPP is freely available under the MIT license, with source code, documentation and sample data at the GitHub (https://github.com/bioinfoDZ/scDAPP).</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae134"},"PeriodicalIF":4.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142336666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RNAMotifProfile: a graph-based approach to build RNA structural motif profiles.","authors":"Md Mahfuzur Rahaman, Shaojie Zhang","doi":"10.1093/nargab/lqae128","DOIUrl":"https://doi.org/10.1093/nargab/lqae128","url":null,"abstract":"<p><p>RNA structural motifs are the recurrent segments in RNA three-dimensional structures that play a crucial role in the functional diversity of RNAs. Understanding the similarities and variations within these recurrent motif groups is essential for gaining insights into RNA structure and function. While recurrent structural motifs are generally assumed to be composed of the same isosteric base interactions, this consistent pattern is not observed across all examples of these motifs. Existing methods for analyzing and comparing RNA structural motifs may overlook variations in base interactions and associated nucleotides. RNAMotifProfile is a novel profile-to-profile alignment algorithm that generates a comprehensive profile from a group of structural motifs, incorporating all base interactions and associated nucleotides at each position. By structurally aligning input motif instances using a guide-tree-based approach, RNAMotifProfile captures the similarities and variations within recurrent motif groups. Additionally, RNAMotifProfile can function as a motif search tool, enabling the identification of instances of a specific motif family by searching with the corresponding profile. The ability to generate accurate and comprehensive profiles for RNA structural motif families, and to search for these motifs, facilitates a deeper understanding of RNA structure-function relationships and potential applications in RNA engineering and therapeutic design.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae128"},"PeriodicalIF":4.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426329/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khalique Newaz, Christoph Schaefers, Katja Weisel, Jan Baumbach, Dmitrij Frishman
{"title":"Prognostic importance of splicing-triggered aberrations of protein complex interfaces in cancer.","authors":"Khalique Newaz, Christoph Schaefers, Katja Weisel, Jan Baumbach, Dmitrij Frishman","doi":"10.1093/nargab/lqae133","DOIUrl":"https://doi.org/10.1093/nargab/lqae133","url":null,"abstract":"<p><p>Aberrant alternative splicing (AS) is a prominent hallmark of cancer. AS can perturb protein-protein interactions (PPIs) by adding or removing interface regions encoded by individual exons. Identifying prognostic exon-exon interactions (EEIs) from PPI interfaces can help discover AS-affected cancer-driving PPIs that can serve as potential drug targets. Here, we assessed the prognostic significance of EEIs across 15 cancer types by integrating RNA-seq data with three-dimensional (3D) structures of protein complexes. By analyzing the resulting EEI network we identified patient-specific perturbed EEIs (i.e., EEIs present in healthy samples but absent from the paired cancer samples or vice versa) that were significantly associated with survival. We provide the first evidence that EEIs can be used as prognostic biomarkers for cancer patient survival. Our findings provide mechanistic insights into AS-affected PPI interfaces. Given the ongoing expansion of available RNA-seq data and the number of 3D structurally-resolved (or confidently predicted) protein complexes, our computational framework will help accelerate the discovery of clinically important cancer-promoting AS events.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae133"},"PeriodicalIF":4.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426328/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jakob Steuer, Malte Sinn, Franziska Eble, Sina Rütschlin, Thomas Böttcher, Jörg S Hartig, Christine Peter
{"title":"Cooperative binding of bivalent ligands yields new insights into the guanidine-II riboswitch.","authors":"Jakob Steuer, Malte Sinn, Franziska Eble, Sina Rütschlin, Thomas Böttcher, Jörg S Hartig, Christine Peter","doi":"10.1093/nargab/lqae132","DOIUrl":"https://doi.org/10.1093/nargab/lqae132","url":null,"abstract":"<p><p>Riboswitches are involved in regulating the gene expression in bacteria. They are located within the untranslated regions of bacterial messenger RNA and function as switches by adjusting their shape, depending on the presence or absence of specific ligands. To decipher the fundamental aspects of bacterial gene control, it is therefore important to understand the mechanisms that underlie these conformational switches. To this end, a combination of an experimental binding study, molecular simulations and machine learning has been employed to obtain insights into the conformational changes and structural dynamics of the guanidine-II riboswitch. By exploiting the design of a bivalent ligand, we were able to study ligand binding in the aptamer dimer at the molecular level. Spontaneous ligand-binding events, which are usually difficult to simulate, were observed and the contributing factors are described. These findings were further confirmed by <i>in vivo</i> experiments, where the cooperative binding effects of the bivalent ligands resulted in increased binding affinity compared to the native guanidinium ligand. Beyond ligand binding itself, the simulations revealed a novel, ligand-dependent base-stacking interaction outside of the binding pocket that stabilizes the riboswitch.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae132"},"PeriodicalIF":4.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423145/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Krzysztof Kotlarz, Magda Mielczarek, Przemysław Biecek, Bernt Guldbrandtsen, Joanna Szyda
{"title":"Exploring the impact of sequence context on errors in SNP genotype calling with whole genome sequencing data using AI-based autoencoder approach.","authors":"Krzysztof Kotlarz, Magda Mielczarek, Przemysław Biecek, Bernt Guldbrandtsen, Joanna Szyda","doi":"10.1093/nargab/lqae131","DOIUrl":"https://doi.org/10.1093/nargab/lqae131","url":null,"abstract":"<p><p>A critical step in the analysis of whole genome sequencing data is variant calling. Despite its importance, variant calling is prone to errors. Our study investigated the association between incorrect single nucleotide polymorphism (SNP) calls and variant quality metrics and nucleotide context. In our study, incorrect SNPs were defined in 20 Holstein-Friesian cows by comparing their SNPs genotypes identified by whole genome sequencing with the IlluminaNovaSeq6000 and the EuroGMD50K genotyping microarray. The dataset was divided into the correct SNP set (666 333 SNPs) and the incorrect SNP set (4 557 SNPs). The training dataset consisted of only the correct SNPs, while the test dataset contained a balanced mix of all the incorrectly and correctly called SNPs. An autoencoder was constructed to identify systematically incorrect SNPs that were marked as outliers by a one-class support vector machine and isolation forest algorithms. The results showed that 59.53% (±0.39%) of the incorrect SNPs had systematic patterns, with the remainder being random errors. The frequent occurrence of the CGC 3-mer was due to mislabelling a call for C. Incorrect T instead of A call was associated with the presence of T in the neighbouring downstream position. These errors may arise due to the fluorescence patterns of nucleotide labelling.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae131"},"PeriodicalIF":4.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Praveen Krishna Chitneedi, Frieder Hadlich, Gabriel C M Moreira, Jose Espinosa-Carrasco, Changxi Li, Graham Plastow, Daniel Fischer, Carole Charlier, Dominique Rocha, Amanda J Chamberlain, Christa Kuehn
{"title":"eQTL-Detect: nextflow-based pipeline for eQTL detection in modular format with sharable and parallelizable scripts.","authors":"Praveen Krishna Chitneedi, Frieder Hadlich, Gabriel C M Moreira, Jose Espinosa-Carrasco, Changxi Li, Graham Plastow, Daniel Fischer, Carole Charlier, Dominique Rocha, Amanda J Chamberlain, Christa Kuehn","doi":"10.1093/nargab/lqae122","DOIUrl":"https://doi.org/10.1093/nargab/lqae122","url":null,"abstract":"<p><p>Bioinformatic pipelines are becoming increasingly complex with the ever-accumulating amount of Next-generation sequencing (NGS) data. Their orchestration is difficult with a simple Bash script, but bioinformatics workflow managers such as Nextflow provide a framework to overcome respective problems. This study used Nextflow to develop a bioinformatic pipeline for detecting expression quantitative trait loci (eQTL) using a DSL2 Nextflow modular syntax, to enable sharing the huge demand for computing power as well as data access limitation across different partners often associated with eQTL studies. Based on the results from a test run with pilot data by measuring the required runtime and computational resources, the new pipeline should be suitable for eQTL studies in large scale analyses.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae122"},"PeriodicalIF":4.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420669/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florence Rufflé, Jérôme Reboul, Anthony Boureux, Benoit Guibert, Chloé Bessière, Raissa Silva, Eric Jourdan, Jean-Baptiste Gaillard, Anne Boland, Jean-François Deleuze, Catherine Sénamaud-Beaufort, Dorothée Selimoglu-Buet, Eric Solary, Nicolas Gilbert, Thérèse Commes
{"title":"Effective requesting method to detect fusion transcripts in chronic myelomonocytic leukemia RNA-seq.","authors":"Florence Rufflé, Jérôme Reboul, Anthony Boureux, Benoit Guibert, Chloé Bessière, Raissa Silva, Eric Jourdan, Jean-Baptiste Gaillard, Anne Boland, Jean-François Deleuze, Catherine Sénamaud-Beaufort, Dorothée Selimoglu-Buet, Eric Solary, Nicolas Gilbert, Thérèse Commes","doi":"10.1093/nargab/lqae117","DOIUrl":"https://doi.org/10.1093/nargab/lqae117","url":null,"abstract":"<p><p>RNA sequencing technology combining short read and long read analysis can be used to detect chimeric RNAs in malignant cells. Here, we propose an integrated approach that uses k-mers to analyze indexed datasets. This approach is used to identify chimeric RNA in chronic myelomonocytic leukemia (CMML) cells, a myeloid malignancy that associates features of myelodysplastic and myeloproliferative neoplasms. In virtually every CMML patient, new generation sequencing identifies one or several somatic driver mutations, typically affecting epigenetic, splicing and signaling genes. In contrast, cytogenetic aberrations are currently detected in only one third of the cases. Nevertheless, chromosomal abnormalities contribute to patient stratification, some of them being associated with higher risk of poor outcome, e.g. through transformation into acute myeloid leukemia (AML). Our approach selects four chimeric RNAs that have been detected and validated in CMML cells. We further focus on <i>NRIP1-MIR99AHG</i>, as this fusion has also recently been detected in AML cells. We show that this fusion encodes three isoforms, including a novel one. Further studies will decipher the biological significance of such a fusion and its potential to improve disease stratification. Taken together, this report demonstrates the ability of a large-scale approach to detect chimeric RNAs in cancer cells.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae117"},"PeriodicalIF":4.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420675/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Schlegel, Rohan Bhardwaj, Yadollah Shahryary, Defne Demirtürk, Alexandre P Marand, Robert J Schmitz, Frank Johannes
{"title":"GenomicLinks: deep learning predictions of 3D chromatin interactions in the maize genome.","authors":"Luca Schlegel, Rohan Bhardwaj, Yadollah Shahryary, Defne Demirtürk, Alexandre P Marand, Robert J Schmitz, Frank Johannes","doi":"10.1093/nargab/lqae123","DOIUrl":"https://doi.org/10.1093/nargab/lqae123","url":null,"abstract":"<p><p>Gene regulation in eukaryotes is partly shaped by the 3D organization of chromatin within the cell nucleus. Distal interactions between <i>cis</i>-regulatory elements and their target genes are widespread, and many causal loci underlying heritable agricultural traits have been mapped to distal non-coding elements. The biology underlying chromatin loop formation in plants is poorly understood. Dissecting the sequence features that mediate distal interactions is an important step toward identifying putative molecular mechanisms. Here, we trained GenomicLinks, a deep learning model, to identify DNA sequence features predictive of 3D chromatin interactions in maize. We found that the presence of binding motifs of specific transcription factor classes, especially bHLH, is predictive of chromatin interaction specificities. Using an <i>in silico</i> mutagenesis approach we show the removal of these motifs from loop anchors leads to reduced interaction probabilities. We were able to validate these predictions with single-cell co-accessibility data from different maize genotypes that harbor natural substitutions in these TF binding motifs. GenomicLinks is currently implemented as an open-source web tool, which should facilitate its wider use in the plant research community.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae123"},"PeriodicalIF":4.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420838/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative analysis of single-cell pathway scoring methods and a novel approach.","authors":"Ruoqiao H Wang, Juilee Thakar","doi":"10.1093/nargab/lqae124","DOIUrl":"https://doi.org/10.1093/nargab/lqae124","url":null,"abstract":"<p><p>Single-cell gene set analysis (scGSA) provides a useful approach for quantifying molecular functions and pathways in high-throughput transcriptomic data, facilitating the biological interpretation of complex human datasets. However, various factors such as gene set size, quality of the gene sets and the dropouts impact the performance of scGSA. To address these limitations, we present a single-cell Pathway Score (scPS) method to measure gene set activity at single-cell resolution. Furthermore, we benchmark our method with six other methods: AUCell, AddModuleScore, JASMINE, UCell, SCSE and ssGSEA. The comparison across all the methods using two different simulation approaches highlights the effect of cell count, gene set size, noise, condition-specific genes and zero imputation on their performance. The results of our study indicate that the scPS is comparable with other single-cell scoring methods and detects fewer false positives. Importantly, this work reveals critical variables in the scGSA.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae124"},"PeriodicalIF":4.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generalized protein identification method for novel and diverse sequencing technologies.","authors":"Bikash Kumar Bhandari, Nick Goldman","doi":"10.1093/nargab/lqae126","DOIUrl":"https://doi.org/10.1093/nargab/lqae126","url":null,"abstract":"<p><p>Protein sequencing is a rapidly evolving field with much progress towards the realization of a new generation of protein sequencers. The early devices, however, may not be able to reliably discriminate all 20 amino acids, resulting in a partial, noisy and possibly error-prone signature of a protein. Rather than achieving <i>de novo</i> sequencing, these devices may aim to identify target proteins by comparing such signatures to databases of known proteins. However, there are no broadly applicable methods for this identification problem. Here, we devise a hidden Markov model method to study the generalized problem of protein identification from noisy signature data. Based on a hypothetical sequencing device that can simulate several novel technologies, we show that on the human protein database (<i>N</i> = 20 181) our method has a good performance under many different operating conditions such as various levels of signal resolvability, different numbers of discriminated amino acids, sequence fragments, and insertion and deletion error rates. Our results demonstrate the possibility of protein identification with high accuracy on many early experimental devices. We anticipate our method to be applicable for a wide range of protein sequencing devices in the future.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae126"},"PeriodicalIF":4.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11409062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142297107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}