Bioinformatics advancesPub Date : 2024-11-14eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae175
Bianka Alexandra Pasat, Eleftherios Pilalis, Katarzyna Mnich, Afshin Samali, Aristotelis Chatziioannou, Adrienne M Gorman
{"title":"MultiOmicsIntegrator: a nextflow pipeline for integrated omics analyses.","authors":"Bianka Alexandra Pasat, Eleftherios Pilalis, Katarzyna Mnich, Afshin Samali, Aristotelis Chatziioannou, Adrienne M Gorman","doi":"10.1093/bioadv/vbae175","DOIUrl":"10.1093/bioadv/vbae175","url":null,"abstract":"<p><strong>Motivation: </strong>Analysis of gene and isoform expression levels is becoming critical for the detailed understanding of biochemical mechanisms. In addition, integrating RNA-seq data with other omics data types, such as proteomics and metabolomics, provides a strong approach for consolidating our understanding of biological processes across various organizational tiers, thus promoting the identification of potential therapeutic targets.</p><p><strong>Results: </strong>We present our pipeline, called MultiOmicsIntegrator (MOI), an inclusive pipeline for comprehensive omics analyses. MOI represents a unified approach that performs in-depth individual analyses of diverse omics. Specifically, exhaustive analysis of RNA-seq data at the level of genes, isoforms of genes, as well as miRNA is offered, coupled with functional annotation and structure prediction of these transcripts. Additionally, proteomics and metabolomics data are supported providing a holistic view of biological systems. Finally, MOI has tools to integrate simultaneously multiple and diverse omics datasets, with both data- and function-driven approaches, fostering a deeper understanding of intricate biological interactions.</p><p><strong>Availability and implementation: </strong>MOI and ReadTheDocs.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae175"},"PeriodicalIF":2.4,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-11-13eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae155
Julia Wrobel, Alex C Soupir, Mitchell T Hayes, Lauren C Peres, Thao Vu, Andrew Leroux, Brooke L Fridley
{"title":"mxfda: a comprehensive toolkit for functional data analysis of single-cell spatial data.","authors":"Julia Wrobel, Alex C Soupir, Mitchell T Hayes, Lauren C Peres, Thao Vu, Andrew Leroux, Brooke L Fridley","doi":"10.1093/bioadv/vbae155","DOIUrl":"10.1093/bioadv/vbae155","url":null,"abstract":"<p><strong>Summary: </strong>Technologies that produce spatial single-cell (SC) data have revolutionized the study of tissue microstructures and promise to advance personalized treatment of cancer by revealing new insights about the tumor microenvironment. Functional data analysis (FDA) is an ideal analytic framework for connecting cell spatial relationships to patient outcomes, but can be challenging to implement. To address this need, we present mxfda, an R package for end-to-end analysis of SC spatial data using FDA. mxfda implements a suite of methods to facilitate spatial analysis of SC imaging data using FDA techniques.</p><p><strong>Availability and implementation: </strong>The mxfda R package is freely available at https://cran.r-project.org/package=mxfda and has detailed documentation, including four vignettes, available at http://juliawrobel.com/mxfda/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae155"},"PeriodicalIF":2.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-11-13eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae174
Timothy Páez-Watson, Ricardo Hernández Medina, Loek Vellekoop, Mark C M van Loosdrecht, S Aljoscha Wahl
{"title":"Conditional flux balance analysis toolbox for python: application to research metabolism in cyclic environments.","authors":"Timothy Páez-Watson, Ricardo Hernández Medina, Loek Vellekoop, Mark C M van Loosdrecht, S Aljoscha Wahl","doi":"10.1093/bioadv/vbae174","DOIUrl":"10.1093/bioadv/vbae174","url":null,"abstract":"<p><strong>Summary: </strong>We present py_cFBA, a Python-based toolbox for conditional flux balance analysis (cFBA). Our toolbox allows for an easy implementation of cFBA models using a well-documented and modular approach and supports the generation of Systems Biology Markup Language models. The toolbox is designed to be user-friendly, versatile, and freely available to non-commercial users, serving as a valuable resource for researchers predicting metabolic behaviour with resource allocation in dynamic-cyclic environments.</p><p><strong>Availability and implementation: </strong>Extensive documentation, installation steps, tutorials, and examples are available at https://tp-watson-python-cfba.readthedocs.io/en/. The py_cFBA python package is available at https://pypi.org/project/py-cfba/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae174"},"PeriodicalIF":2.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11593493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142735127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-11-12eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae177
Stefanie Lück, Uwe Scholz, Dimitar Douchkov
{"title":"Introducing GWAStic: a user-friendly, cross-platform solution for genome-wide association studies and genomic prediction.","authors":"Stefanie Lück, Uwe Scholz, Dimitar Douchkov","doi":"10.1093/bioadv/vbae177","DOIUrl":"10.1093/bioadv/vbae177","url":null,"abstract":"<p><strong>Motivation: </strong>Advances in genomics have created an insistent need for accessible tools that simplify complex genetic data analysis, enabling researchers across fields to harness the power of genome-wide association studies and genomic prediction. GWAStic was developed to bridge this gap, providing an intuitive platform that combines artificial intelligence with traditional statistical methods, making sophisticated genomic analysis accessible without requiring deep expertise in statistical software.</p><p><strong>Results: </strong>We present GWAStic, an intuitive, cross-platform desktop application designed to streamline genome-wide association studies and genomic prediction for biological and medical researchers. With a user-friendly graphical interface, GWAStic integrates machine learning and traditional statistical approaches to support genetic analysis. The application accepts inputs from standard text-based Variant Call Formats and PLINK binary files, generating clear graphical outputs, including Manhattan plots, quantile-quantile plots, and genomic prediction correlation plots to enhance data visualization and analysis.</p><p><strong>Availability and implementation: </strong>Project page: https://github.com/snowformatics/gwastic_desktop; GWAStic documentation: https://snowformatics.gitbook.io/product-docs; PyPI: https://pypi.org/project/gwastic-desktop/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae177"},"PeriodicalIF":2.4,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11643344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-11-09eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae176
Xiangnan Li, Yaqi Huang, Shuming Wang, Meng Hao, Yi Li, Hui Zhang, Zixin Hu
{"title":"LUKB: preparing local UK Biobank data for analysis.","authors":"Xiangnan Li, Yaqi Huang, Shuming Wang, Meng Hao, Yi Li, Hui Zhang, Zixin Hu","doi":"10.1093/bioadv/vbae176","DOIUrl":"10.1093/bioadv/vbae176","url":null,"abstract":"<p><strong>Motivation: </strong>The UK Biobank data holds immense potential for human health research. However, the complex data preparation and interpretation processes often act as barriers for researchers, diverting them from their core research questions.</p><p><strong>Results: </strong>We developed LUKB, an R Shiny-based web tool that simplifies UK Biobank data preparation by automating these preprocessing tasks. LUKB reduces preprocessing time and integrates functions for initial data exploration, allowing researchers to dedicate more time to their scientific endeavors. Detailed deployment and usage can be found in the Supplementary Data.</p><p><strong>Availability and implementation: </strong>LUKB is freely available at https://github.com/HaiGenBuShang/LUKB.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae176"},"PeriodicalIF":2.4,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11580680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-11-07eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae158
Chaoyue Sun, Yanjun Li, Simone Marini, Alberto Riva, Dapeng Oliver Wu, Ruogu Fang, Marco Salemi, Brittany Rife Magalis
{"title":"Phylogenetic-informed graph deep learning to classify dynamic transmission clusters in infectious disease epidemics.","authors":"Chaoyue Sun, Yanjun Li, Simone Marini, Alberto Riva, Dapeng Oliver Wu, Ruogu Fang, Marco Salemi, Brittany Rife Magalis","doi":"10.1093/bioadv/vbae158","DOIUrl":"https://doi.org/10.1093/bioadv/vbae158","url":null,"abstract":"<p><strong>Motivation: </strong>In the midst of an outbreak, identification of groups of individuals that represent risk for transmission of the pathogen under investigation is critical to public health efforts. Dynamic transmission patterns within these clusters, whether it be the result of changes at the level of the virus (e.g. infectivity) or host (e.g. vaccination), are critical in strategizing public health interventions, particularly when resources are limited. Phylogenetic trees are widely used not only in the detection of transmission clusters, but the topological shape of the branches within can be useful sources of information regarding the dynamics of the represented population.</p><p><strong>Results: </strong>We evaluated the limitation of existing tree shape metrics when dealing with dynamic transmission clusters and propose instead a phylogeny-based deep learning system -<i>DeepDynaTree</i>- for dynamic classification. Comprehensive experiments carried out on a variety of simulated epidemic growth models and HIV epidemic data indicate that this graph deep learning approach is effective, robust, and informative for cluster dynamic prediction. Our results confirm that <i>DeepDynaTree</i> is a promising tool for transmission cluster characterization that can be modified to address the existing limitations and deficiencies in knowledge regarding the dynamics of transmission trajectories for groups at risk of pathogen infection.</p><p><strong>Availability and implementation: </strong><i>DeepDynaTree</i> is available under an MIT Licence in https://github.com/salemilab/DeepDynaTree.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae158"},"PeriodicalIF":2.4,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552518/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142633757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-11-05eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae172
Stephen Chapman, Theo Brunet, Arnaud Mourier, Bianca H Habermann
{"title":"MitoMAMMAL: a genome scale model of mammalian mitochondria predicts cardiac and BAT metabolism.","authors":"Stephen Chapman, Theo Brunet, Arnaud Mourier, Bianca H Habermann","doi":"10.1093/bioadv/vbae172","DOIUrl":"https://doi.org/10.1093/bioadv/vbae172","url":null,"abstract":"<p><strong>Motivation: </strong>Mitochondria are essential for cellular metabolism and are inherently flexible to allow correct function in a wide range of tissues. Consequently, dysregulated mitochondrial metabolism affects different tissues in different ways leading to challenges in understanding the pathology of mitochondrial diseases. System-level metabolic modelling is useful in studying tissue-specific mitochondrial metabolism, yet despite the mouse being a common model organism in research, no mouse specific mitochondrial metabolic model is currently available.</p><p><strong>Results: </strong>Building upon the similarity between human and mouse mitochondrial metabolism, we present mitoMammal, a genome-scale metabolic model that contains human and mouse specific gene-product reaction rules. MitoMammal is able to model mouse and human mitochondrial metabolism. To demonstrate this, using an adapted E-Flux algorithm, we integrated proteomic data from mitochondria of isolated mouse cardiomyocytes and mouse brown adipocyte tissue, as well as transcriptomic data from in vitro differentiated human brown adipocytes and modelled the context specific metabolism using flux balance analysis. In all three simulations, mitoMammal made mostly accurate, and some novel predictions relating to energy metabolism in the context of cardiomyocytes and brown adipocytes. This demonstrates its usefulness in research in cardiac disease and diabetes in both mouse and human contexts.</p><p><strong>Availability and implementation: </strong>The MitoMammal Jupyter Notebook is available at: https://gitlab.com/habermann_lab/mitomammal.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae172"},"PeriodicalIF":2.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-11-05eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae171
Saish Jaiswal, Hema A Murthy, Manikandan Narayanan
{"title":"SpecGMM: Integrating Spectral analysis and Gaussian Mixture Models for taxonomic classification and identification of discriminative DNA regions.","authors":"Saish Jaiswal, Hema A Murthy, Manikandan Narayanan","doi":"10.1093/bioadv/vbae171","DOIUrl":"10.1093/bioadv/vbae171","url":null,"abstract":"<p><strong>Motivation: </strong>Genomic signal processing (GSP), which transforms biomolecular sequences into discrete signals for spectral analysis, has provided valuable insights into DNA sequence, structure, and evolution. However, challenges persist with spectral representations of variable-length sequences for tasks like species classification and in interpreting these spectra to identify discriminative DNA regions.</p><p><strong>Results: </strong>We introduce SpecGMM, a novel framework that integrates sliding window-based Spectral analysis with a Gaussian Mixture Model to transform variable-length DNA sequences into fixed-dimensional spectral representations for taxonomic classification. SpecGMM's hyperparameters were selected using a dataset of plant sequences, and applied unchanged across diverse datasets, including mitochondrial DNA, viral and bacterial genome, and 16S rRNA sequences. Across these datasets, SpecGMM outperformed a baseline method, with 9.45% average and 35.55% maximum improvement in test accuracies for a Linear Discriminant classifier. Regarding interpretability, SpecGMM revealed discriminative hypervariable regions in 16S rRNA sequences-particularly V3/V4 for discriminating higher taxa and V2/V3 for lower taxa-corroborating their known classification relevance. SpecGMM's spectrogram video analysis helped visualize species-specific DNA signatures. SpecGMM thus provides a robust and interpretable method for spectral DNA analysis, opening new avenues in GSP research.</p><p><strong>Availability and implementation: </strong>SpecGMM's source code is available at https://github.com/BIRDSgroup/SpecGMM.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae171"},"PeriodicalIF":2.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RecGOBD: accurate recognition of gene ontology related brain development protein functions through multi-feature fusion and attention mechanisms.","authors":"Zhiliang Xia, Shiqiang Ma, Jiawei Li, Yan Guo, Limin Jiang, Jijun Tang","doi":"10.1093/bioadv/vbae163","DOIUrl":"10.1093/bioadv/vbae163","url":null,"abstract":"<p><strong>Motivation: </strong>Protein function prediction is crucial in bioinformatics, driven by the growth of protein sequence data from high-throughput technologies. Traditional methods are costly and slow, underscoring the need for computational solutions. While deep learning offers powerful tools, many models lack optimization for brain development datasets, critical for neurodevelopmental disorder research. To address this, we developed RecGOBD (Recognition of Gene Ontology-related Brain Development protein function), a model tailored to predict protein functions essential to brain development.</p><p><strong>Result: </strong>RecGOBD targets 10 key gene ontology (GO) terms for brain development, embedding protein sequences associated with these terms. Leveraging advanced pre-trained models, it captures both sequence and structure data, aligning them with GO terms through attention mechanisms. The category attention layer enhances prediction accuracy. RecGOBD surpassed five benchmark models in AUROC, AUPR, and Fmax metrics and was further used to predict autism-related protein functions and assess mutation impacts on GO terms. These findings highlight RecGOBD's potential in advancing protein function prediction for neurodevelopmental disorders.</p><p><strong>Availability and implementation: </strong>All Python codes associated with this study are available at https://github.com/ZL-Xia/RECGOBD.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae163"},"PeriodicalIF":2.4,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-10-30eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae165
Stephan Breimann, Dmitrij Frishman
{"title":"AAclust: <i>k</i>-optimized clustering for selecting redundancy-reduced sets of amino acid scales.","authors":"Stephan Breimann, Dmitrij Frishman","doi":"10.1093/bioadv/vbae165","DOIUrl":"10.1093/bioadv/vbae165","url":null,"abstract":"<p><strong>Summary: </strong>Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters <i>k</i>, such as <i>k</i>-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where <i>k</i> can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications.</p><p><strong>Availability and implementation: </strong>The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae165"},"PeriodicalIF":2.4,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}