{"title":"The axes of biology: a novel axes-based network embedding paradigm to decipher the functional mechanisms of the cell.","authors":"Sergio Doria-Belenguer, Alexandros Xenos, Gaia Ceddia, Noël Malod-Dognin, Nataša Pržulj","doi":"10.1093/bioadv/vbae075","DOIUrl":"10.1093/bioadv/vbae075","url":null,"abstract":"<p><strong>Summary: </strong>Common approaches for deciphering biological networks involve network embedding algorithms. These approaches strictly focus on clustering the genes' embedding vectors and interpreting such clusters to reveal the hidden information of the networks. However, the difficulty in interpreting the genes' clusters and the limitations of the functional annotations' resources hinder the identification of the currently unknown cell's functioning mechanisms. We propose a new approach that shifts this functional exploration from the embedding vectors of genes in space to the axes of the space itself. Our methodology better disentangles biological information from the embedding space than the classic gene-centric approach. Moreover, it uncovers new data-driven functional interactions that are unregistered in the functional ontologies, but biologically coherent. Furthermore, we exploit these interactions to define new higher-level annotations that we term Axes-Specific Functional Annotations and validate them through literature curation. Finally, we leverage our methodology to discover evolutionary connections between cellular functions and the evolution of species.</p><p><strong>Availability and implementation: </strong>Data and source code can be accessed at https://gitlab.bsc.es/sdoria/axes-of-biology.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11142626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felipe Rojas-Rodríguez, Marjanka K Schmidt, S. Canisius
{"title":"Assessing the validity of driver gene identification tools for targeted genome sequencing data","authors":"Felipe Rojas-Rodríguez, Marjanka K Schmidt, S. Canisius","doi":"10.1093/bioadv/vbae073","DOIUrl":"https://doi.org/10.1093/bioadv/vbae073","url":null,"abstract":"\u0000 \u0000 \u0000 Most cancer driver gene identification tools have been developed for whole-exome sequencing data. Targeted sequencing is a popular alternative to whole-exome sequencing for large cancer studies due to its greater depth at a lower cost per tumor. Unlike whole-exome sequencing, targeted sequencing only enables mutation calling for a selected subset of genes. Whether existing driver gene identification tools remain valid in that context has not previously been studied.\u0000 \u0000 \u0000 \u0000 We evaluated the validity of seven popular driver gene identification tools when applied to targeted sequencing data. Based on whole-exome data of 14 different cancer types from TCGA, we constructed matching targeted datasets by keeping only the mutations overlapping with the pan-cancer MSK-IMPACT panel and, in the case of breast cancer, also the breast-cancer-specific B-CAST panel. We then compared the driver gene predictions obtained on whole-exome and targeted mutation data for each of the seven tools. Differences in how the tools model background mutation rates were the most important determinant of their validity on targeted sequencing data. Based on our results, we recommend OncodriveFML, OncodriveCLUSTL, 20/20+, dNdSCv, and ActiveDriver for driver gene identification in targeted sequencing data, whereas MutSigCV and DriverML are best avoided in that context.\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141104097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-22eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae069
Christos A Ouzounis
{"title":"Biology's transformation: from observation through experiment to computation.","authors":"Christos A Ouzounis","doi":"10.1093/bioadv/vbae069","DOIUrl":"10.1093/bioadv/vbae069","url":null,"abstract":"<p><strong>Summary: </strong>We explore the nuanced temporal and epistemological distinctions among natural sciences, particularly the contrasting treatment of time and the interplay between theory and experimentation. Physics, an exemplar of mature science, relies on theoretical models for predictability and simulations. In contrast, biology, traditionally experimental, is witnessing a computational surge, with data analytics and simulations reshaping its research paradigms. Despite these strides, a unified theoretical framework in biology remains elusive. We propose that contemporary global challenges might usher in a renewed emphasis, presenting an opportunity for the establishment of a novel theoretical underpinning for the life sciences.</p><p><strong>Availability and implementation: </strong>https://github.com/ouzounis/CLS-emerges Data in Json format, Images in PNG format.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11127110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141154743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeGeCI 1.1: a web platform for gene annotation of mitochondrial genomes","authors":"Lisa Fiedler, Matthias Bernt, Martin Middendorf","doi":"10.1093/bioadv/vbae072","DOIUrl":"https://doi.org/10.1093/bioadv/vbae072","url":null,"abstract":"\u0000 \u0000 \u0000 DeGeCI is a command line tool that generates fully automated de novo gene predictions from mitochondrial nucleotide sequences by using a reference database of annotated mitogenomes which is represented as a de Bruijngraph. The input genome is mapped to this graph, creating a subgraph, which is then post-processed by a clustering routine. Version 1.1 of DeGeCI offers a web front-end for GUI-based input. It also introduces a new taxonomic filter pipeline that allows the species in the reference database to be restricted to a user-specified taxonomic classification and allows for gene boundary optimization when providing the translation table of the input genome.\u0000 \u0000 \u0000 \u0000 The web platform is accessible at https://degeci.informatik.uni-leipzig.de. Source code is freely available at https://git.informatik.uni-leipzig.de/lfiedler/degeci.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140985216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daejin Hyung, Soo Young Cho, Kyubin Lee, Namhee Yu, Sehwa Hong, Charny Park
{"title":"ASpedia-R: a package to retrieve junction-incorporating features and knowledge-based functions of human alternative splicing events","authors":"Daejin Hyung, Soo Young Cho, Kyubin Lee, Namhee Yu, Sehwa Hong, Charny Park","doi":"10.1093/bioadv/vbae071","DOIUrl":"https://doi.org/10.1093/bioadv/vbae071","url":null,"abstract":"\u0000 \u0000 \u0000 Alternative splicing (AS) is a key regulatory mechanism that confers genetic diversity and phenotypic plasticity of human. The exons and their flanking regions include comprehensive junction-incorporating sequence features like splicing factor binding sites, and protein domains. These elements involve in exon usage, and finally contribute to isoform-specific biological functions. Splicing-associated sequence features are involved in the multilayered regulation encompassing DNA and proteins. However, most analysis applications have investigated limited sequence features, like protein domains. It is insufficient to explain the comprehensive cause and effect of exon-specific biological processes.\u0000 With the advent of RNA-seq technology, global AS event analysis has deduced more precise results. As accumulating analysis results, it could be a challenge to identify multi-omics sequence features for AS events. Therefore, application to investigate multi-omics sequence features is useful to scan critical evidence.\u0000 ASpedia-R is an R package to interrogate junction-incorporating sequence features for human genes. Our database collected the heterogeneous profile encompassed from DNA to protein. Additionally, knowledge-based splicing genes were collected using text-mining to test the association with specific pathway terms. Our package retrieves AS events for high-throughput data analysis results via AS event ID converter. Finally, result profile could be visualized and saved to multiple formats: sequence feature result table, genome track figure, protein-protein interaction network, and gene set enrichment test result table. Our package is a convenient tool to understand global regulation mechanisms by splicing.\u0000 \u0000 \u0000 \u0000 The package source code is freely available to non-commercial users at https://github.com/ncc-bioinfo/ASpedia-R.\u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140988292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Hulshof, Bram Nap, Filippo Martinelli, Ines Thiele
{"title":"Microbial abundances retrieved from sequencing data—automated NCBI taxonomy (MARS): a pipeline to create relative microbial abundance data for the microbiome modelling toolbox and utilising homosynonyms for efficient mapping to resources","authors":"T. Hulshof, Bram Nap, Filippo Martinelli, Ines Thiele","doi":"10.1093/bioadv/vbae068","DOIUrl":"https://doi.org/10.1093/bioadv/vbae068","url":null,"abstract":"\u0000 \u0000 \u0000 Computational approaches to the functional characterisation of the microbiome, such as the Microbiome Modelling Toolbox, require precise information on microbial composition and relative abundances. However, challenges arise from homosynonyms—different names referring to the same taxon, which can hinder the mapping process and lead to missed species mapping when using microbial metabolic reconstruction resources, such as AGORA and APOLLO.\u0000 \u0000 \u0000 \u0000 We introduce the integrated MARS pipeline, a user-friendly Python-based solution that addresses these challenges. MARS automates the extraction of relative abundances from metagenomic reads, maps species and genera onto microbial metabolic reconstructions, and accounts for alternative taxonomic names. It normalises microbial reads, provides an optional cut-off for low-abundance taxa, and produces relative abundance tables apt for integration with the Microbiome Modelling Toolbox. A sub-component of the pipeline automates the task of identifying homosynonyms, leveraging web scraping to find taxonomic IDs of given species, searching NCBI for alternative names, and cross-reference them with microbial reconstruction resources. Taken together, MARS streamlines the entire process from processed metagenomic reads to relative abundance, thereby significantly reducing time and effort when working with microbiome data.\u0000 \u0000 \u0000 \u0000 MARS is implemented in Python. It can be found as an interactive application here: https://mars-pipeline.streamlit.app/along with a detailed documentation here: https://github.com/ThieleLab/mars-pipeline.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140992636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Akdeniz, O. Frei, Espen Hagen, T. T. Filiz, Sandeep Karthikeyan, Joëlle Pasman, Andreas Jangmo, Jacob Bergstedt, John R Shorter, Richard Zetterberg, J. Meijsen, I. Sønderby, Alfonso Buil, M. Tesli, Yi Lu, Patrick Sullivan, Ole A Andreassen, E. Hovig
{"title":"COSGAP: COntainerized statistical genetics analysis pipelines","authors":"B. Akdeniz, O. Frei, Espen Hagen, T. T. Filiz, Sandeep Karthikeyan, Joëlle Pasman, Andreas Jangmo, Jacob Bergstedt, John R Shorter, Richard Zetterberg, J. Meijsen, I. Sønderby, Alfonso Buil, M. Tesli, Yi Lu, Patrick Sullivan, Ole A Andreassen, E. Hovig","doi":"10.1093/bioadv/vbae067","DOIUrl":"https://doi.org/10.1093/bioadv/vbae067","url":null,"abstract":"\u0000 \u0000 \u0000 The collection and analysis of sensitive data in large-scale consortia for statistical genetics is hampered by multiple challenges, due to their non-shareable nature. Time-consuming issues in installing software frequently arise due to different operating systems, software dependencies, and limited internet access. For federated analysis across sites, it can be challenging to resolve different problems, including format requirements, data wrangling, setting up analysis on high-performance computing facilities, etc. Easier, more standardized, automated protocols and pipelines can be solutions to overcome these issues. We have developed one such solution for statistical genetic data analysis using software container technologies. This solution, named COSGAP: “COntainerized Statistical Genetics Analysis Pipelines”, consists of already established software tools placed into Singularity containers, alongside corresponding code and instructions on how to perform statistical genetic analyses, such as genome-wide association studies, polygenic scoring, LD score regression, Gaussian Mixture Models, and gene-set analysis. Using provided helper scripts written in Python, users can obtain auto-generated scripts to conduct the desired analysis either on HPC facilities or on a personal computer. COSGAP is actively being applied by users from different countries and projects to conduct genetic data analyses without spending much effort on software installation, converting data formats, and other technical requirements.\u0000 \u0000 \u0000 \u0000 COSGAP is freely available on GitHub (https://github.com/comorment/containers) under the GPLv3 license.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140996941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-08eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae066
Willow Kion-Crosby, Lars Barquist
{"title":"Network depth affects inference of gene sets from bacterial transcriptomes using denoising autoencoders.","authors":"Willow Kion-Crosby, Lars Barquist","doi":"10.1093/bioadv/vbae066","DOIUrl":"10.1093/bioadv/vbae066","url":null,"abstract":"<p><strong>Summary: </strong>The increasing number of publicly available bacterial gene expression data sets provides an unprecedented resource for the study of gene regulation in diverse conditions, but emphasizes the need for self-supervised methods for the automated generation of new hypotheses. One approach for inferring coordinated regulation from bacterial expression data is through neural networks known as denoising autoencoders (DAEs) which encode large datasets in a reduced bottleneck layer. We have generalized this application of DAEs to include deep networks and explore the effects of network architecture on gene set inference using deep learning. We developed a DAE-based pipeline to extract gene sets from transcriptomic data in <i>Escherichia coli</i>, validate our method by comparing inferred gene sets with known pathways, and have used this pipeline to explore how the choice of network architecture impacts gene set recovery. We find that increasing network depth leads the DAEs to explain gene expression in terms of fewer, more concisely defined gene sets, and that adjusting the width results in a tradeoff between generalizability and biological inference. Finally, leveraging our understanding of the impact of DAE architecture, we apply our pipeline to an independent uropathogenic <i>E.coli</i> dataset to identify genes uniquely induced during human colonization.</p><p><strong>Availability and implementation: </strong>https://github.com/BarquistLab/DAE_architecture_exploration.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11256956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-05-08eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae063
Jingcheng Yang, Mo Sun, Zihan Ran, Taehwan Yang, Deepali L Kundnani, Francesca Storici, Penghao Xu
{"title":"rNMPID: a database for riboNucleoside MonoPhosphates in DNA.","authors":"Jingcheng Yang, Mo Sun, Zihan Ran, Taehwan Yang, Deepali L Kundnani, Francesca Storici, Penghao Xu","doi":"10.1093/bioadv/vbae063","DOIUrl":"10.1093/bioadv/vbae063","url":null,"abstract":"<p><strong>Motivation: </strong>Ribonucleoside monophosphates (rNMPs) are the most abundant non-standard nucleotides embedded in genomic DNA. If the presence of rNMP in DNA cannot be controlled, it can lead to genome instability. The actual regulatory functions of rNMPs in DNA remain mainly unknown. Considering the association between rNMP embedment and various diseases and cancer, the phenomenon of rNMP embedment in DNA has become a prominent area of research in recent years.</p><p><strong>Results: </strong>We introduce the rNMPID database, which is the first database revealing rNMP-embedment characteristics, strand bias, and preferred incorporation patterns in the genomic DNA of samples from bacterial to human cells of different genetic backgrounds. The rNMPID database uses datasets generated by different rNMP-mapping techniques. It provides the researchers with a solid foundation to explore the features of rNMP embedded in the genomic DNA of multiple sources, and their association with cellular functions, and, in future, disease. It also significantly benefits researchers in the fields of genetics and genomics who aim to integrate their studies with the rNMP-embedment data.</p><p><strong>Availability and implementation: </strong>rNMPID is freely accessible on the web at https://www.rnmpid.org.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11088741/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashley V. Schwartz, Karilyn E. Sant, Uduak Z. George
{"title":"danRerLib: a python package for zebrafish transcriptomics","authors":"Ashley V. Schwartz, Karilyn E. Sant, Uduak Z. George","doi":"10.1093/bioadv/vbae065","DOIUrl":"https://doi.org/10.1093/bioadv/vbae065","url":null,"abstract":"\u0000 \u0000 \u0000 Understanding the pathways and biological processes underlying differential gene expression is fundamental for characterizing gene expression changes in response to an experimental condition. Zebrafish, with a transcriptome closely mirroring that of humans, are frequently utilized as a model for human development and disease. However, a challenge arises due to the incomplete annotations of zebrafish pathways and biological processes, with more comprehensive annotations existing in humans. This incompleteness may result in biased functional enrichment findings and loss of knowledge. danRerLib, a versatile Python package for zebrafish transcriptomics researchers, overcomes this challenge and provides a suite of tools to be executed in Python including gene ID mapping, orthology mapping for the zebrafish and human taxonomy, and functional enrichment analysis utilizing the latest updated Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. danRerLib enables functional enrichment analysis for GO and KEGG pathways, even when they lack direct zebrafish annotations through the orthology of human-annotated functional annotations. This approach enables researchers to extend their analysis to a wider range of pathways, elucidating additional mechanisms of interest and greater insight into experimental results.\u0000 \u0000 \u0000 \u0000 danRerLib, along with comprehensive documentation and tutorials, is freely available. The source code is available at https://github.com/sdsucomptox/danrerlib/ with associated documentation and tutorials at https://sdsucomptox.github.io/danrerlib/. The package has been developed with Python 3.9 and is available for installation on the package management systems PIP (https://pypi.org/project/danrerlib/) and Conda (https://anaconda.org/sdsu_comptox/danrerlib) with additional installation instructions on the documentation website.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141007114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}