Beatriz Vieira Mourato, Ivan Tsers, Svenja Denker, Fabian Klötzl, Bernhard Haubold
{"title":"Marker Discovery in the Large","authors":"Beatriz Vieira Mourato, Ivan Tsers, Svenja Denker, Fabian Klötzl, Bernhard Haubold","doi":"10.1093/bioadv/vbae113","DOIUrl":"https://doi.org/10.1093/bioadv/vbae113","url":null,"abstract":"\u0000 \u0000 \u0000 Markers for polymerase chain reaction are routinely constructed by taking regions common to the genomes of a target organism and subtracting the regions found in the targets’ closest relatives, their neighbors. This approach is implemented in the published package Fur, which originally required memory proportional to the number of nucleotides in the neighborhood. This does not scale well.\u0000 \u0000 \u0000 \u0000 Here we describe a new version of Fur that only requires memory proportional to the longest neighbor. In spite of its greater memory efficiency, the new Fur remains fast and is accurate. We demonstrate this through application to simulated sequences and comparison to an efficient alternative. Then we use the new Fur to extract markers from 120 reference bacteria. To make this feasible, we also introduce software for automatically finding target and neighbor genomes and for assessing markers. We pick the best primers from the ten most sequenced reference bacteria and show their excellent in silico sensitivity and specificity.\u0000 \u0000 \u0000 \u0000 Fur is available from github.com/evolbioinf/fur, in the Docker image hub.docker.com/r/beatrizvm/mapro, and in the Code Ocean capsule 10.24433/CO.7955947.v1.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"91 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141798031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ZMPY3D: Accelerating protein structure volume analysis through vectorized 3D Zernike Moments and Python-based GPU Integration","authors":"Jhih Siang Lai, Stephen K. Burley, José M. Duarte","doi":"10.1093/bioadv/vbae111","DOIUrl":"https://doi.org/10.1093/bioadv/vbae111","url":null,"abstract":"\u0000 \u0000 \u0000 Volumetric 3D object analyses are being applied in research fields such as structural bioinformatics, biophysics, and structural biology, with potential integration of Artificial Intelligence/Machine Learning (AI/ML) techniques. One such method, 3D Zernike moments, has proven valuable in analyzing protein structures (e.g., protein fold classification, protein-protein interaction analysis, and molecular dynamics simulations). Their compactness and efficiency make them amenable to large-scale analyses. Established methods for deriving 3D Zernike moments, however, can be inefficient, particularly when higher order terms are required, hindering broader applications. As the volume of experimental and computationally-predicted protein structure information continues to increase, structural biology has become a “big data” science requiring more efficient analysis tools.\u0000 \u0000 \u0000 \u0000 This application note presents a Python-based software package, ZMPY3D, to accelerate computation of 3D Zernike moments by vectorizing the mathematical formulae and using graphical processing units (GPUs). The package offers popular GPU-supported libraries such as CuPy and TensorFlow together with NumPy implementations, aiming to improve computational efficiency, adaptability, and flexibility in future algorithm development. The ZMPY3D package can be installed via PyPI, and the source code is available from GitHub. Volumetric-based protein 3D structural similarity scores and transform matrix of superposition functionalities have both been implemented, creating a powerful computational tool that will allow the research community to amalgamate 3D Zernike moments with existing AI/ML tools, to advance research and education in protein structure bioinformatics.\u0000 \u0000 \u0000 \u0000 ZMPY3D, implemented in Python, is available on GitHub (https://github.com/tawssie/ZMPY3D) and PyPI, released under the GPL License.\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"26 24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141802818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TWCOM: an R package for inference of cell-cell communication on spatially resolved transcriptomics data","authors":"Dongyuan Wu, Susmita Datta","doi":"10.1093/bioadv/vbae101","DOIUrl":"https://doi.org/10.1093/bioadv/vbae101","url":null,"abstract":"\u0000 \u0000 \u0000 The inference of cell-cell communication is important, as it unveils the intricate cellular behaviors at the molecular level, providing crucial insights essential for understanding complex biological processes and informing targeted interventions in various pathological contexts. Here, we present TWCOM, an R package that implements a Tweedie distribution-based model for accurate cell-cell communication inference. Operating under a generalized additive model framework, TWCOM adeptly handles both single-cell resolution and spot-based spatially resolved transcriptomics data, providing a versatile tool for robust biological sample analysis.\u0000 \u0000 \u0000 \u0000 The R package TWCOM is available at https://github.com/dongyuanwu/TWCOM. Comprehensive documentation is included with the package.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"1 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141641772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryna K. Chepeleva, T. Kaoma, Andrei Zinovyev, Reka Toth, Petr V Nazarov
{"title":"consICA: an R package for robust reference-free deconvolution of multi-omics data","authors":"Maryna K. Chepeleva, T. Kaoma, Andrei Zinovyev, Reka Toth, Petr V Nazarov","doi":"10.1093/bioadv/vbae102","DOIUrl":"https://doi.org/10.1093/bioadv/vbae102","url":null,"abstract":"\u0000 \u0000 \u0000 Deciphering molecular signals from omics data helps for understanding cellular processes and disease progression. Effective algorithms for extracting these signals are essential, with a strong emphasis on robustness and reproducibility.\u0000 \u0000 \u0000 \u0000 R/Bioconductor package consICA implements consensus independent component analysis (ICA) – a data-driven deconvolution method to decompose heterogeneous omics data and extract features suitable for patient stratification and multimodal data integration. The method separates biologically relevant molecular signals from technical effects and provides information about the cellular composition and biological processes. Build-in annotation, survival analysis and report generation provide useful tools for interpretation of extracted signals. The implementation of parallel computing in the package ensures efficient analysis using modern multicore systems. The package offers a reproducible and efficient data-driven solution for the analysis of complex molecular profiles, with significant implications for cancer research.\u0000 \u0000 \u0000 \u0000 The package is implemented in R and available under MIT license at Bioconductor (https://bioconductor.org/packages/consICA) or at GitHub (https://github.com/biomod-lih/consICA\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"49 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141652081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Yves Moreau has received the 2023 Einstein Foundation Individual Award for Promoting Quality in Research","authors":"Thomas Lengauer","doi":"10.1093/bioadv/vbae039","DOIUrl":"https://doi.org/10.1093/bioadv/vbae039","url":null,"abstract":"","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"48 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140366667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of bitterness based on modular designed graph neural network","authors":"Yi He, Kaifeng Liu, Yuyang Liu, Weiwei Han","doi":"10.1093/bioadv/vbae041","DOIUrl":"https://doi.org/10.1093/bioadv/vbae041","url":null,"abstract":"Abstract Motivation Bitterness plays a pivotal role in our ability to identify and evade harmful substances in food. As one of the five tastes, it constitutes a critical component of our sensory experiences. However, the reliance on human tasting for discerning flavors presents cost challenges, rendering in silico prediction of bitterness a more practical alternative. Results In this study, we introduce the use of Graph Neural Networks (GNNs) in bitterness prediction, superseding traditional machine learning techniques. We developed an advanced model, a Hybrid Graph Neural Network (HGNN), surpassing conventional GNNs according to tests on public datasets. Using HGNN and three other GNNs, we designed BitterGNNs, a bitterness predictor that achieved an AUC value of 0.87 in both external bitter/non-bitter and bitter/sweet evaluations, outperforming the acclaimed RDKFP-MLP predictor with AUC values of 0.86 and 0.85. We further created a bitterness prediction website and database, TastePD (https://www.tastepd.com/). The BitterGNNs predictor, built on GNNs, offers accurate bitterness predictions, enhancing the efficacy of bitterness prediction, aiding advanced food testing methodology development, and deepening our understanding of bitterness origins. Availability and implementation TastePD can be available at https://www.tastepd.com, all codes are at https://github.com/heyigacu/BitterGNN.","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"154 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140393595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin Womack, Viren Shah, Said H Audi, Scott S. Terhune, R. K. Dash
{"title":"BioModME for building and simulating dynamic computational models of complex biological systems","authors":"Justin Womack, Viren Shah, Said H Audi, Scott S. Terhune, R. K. Dash","doi":"10.1093/bioadv/vbae023","DOIUrl":"https://doi.org/10.1093/bioadv/vbae023","url":null,"abstract":"\u0000 \u0000 \u0000 Molecular mechanisms of biological functions and disease processes are exceptionally complex, and our ability to interrogate and understand relationships is becoming increasingly dependent on the use of computational modeling. We have developed “BioModME”, a standalone R-based web application package, providing an intuitive and comprehensive graphical user interface to help investigators build, solve, visualize, and analyze computational models of complex biological systems. Some important features of the application package include multi-region system modeling, custom reaction rate laws and equations, unit conversion, model parameter estimation utilizing experimental data, and import and export of model information in the Systems Biology Matkup Language format. The users can also export models to MATLAB, R, and Python languages and the equations to LaTeX and Mathematical Markup Language formats. Other important features include an online model development platform, multi-modality visualization tool, and efficient numerical solvers for differential-algebraic equations and optimization.\u0000 \u0000 \u0000 \u0000 All relevant software information including documentation and tutorials can be found at https://mcw.marquette.edu/biomedical-engineering/computational-systems-biology-lab/biomodme.php. Deployed software can be accessed at https://biomodme.ctsi.mcw.edu/. Source code is freely available for download at https://github.com/MCWComputationalBiologyLab/BioModME.\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"2 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139957965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative transcriptomic and epigenomic data analysis: a primer","authors":"Louis Coussement, Wim Van Criekinge, Tim de Meyer","doi":"10.1093/bioadv/vbae019","DOIUrl":"https://doi.org/10.1093/bioadv/vbae019","url":null,"abstract":"\u0000 \u0000 \u0000 The advent of microarray and second generation sequencing technology has revolutionized the field of molecular biology, allowing researchers to quantitatively assess transcriptomic and epigenomic features in a comprehensive and cost-efficient manner. Moreover, technical advancements have pushed the resolution of these sequencing techniques to the single cell level. As a result, the bottleneck of molecular biology research has shifted from the bench to the subsequent omics data analysis. Even though most methodologies share the same general strategy, state-of-the-art literature typically focuses on data type specific approaches and already assumes expert knowledge. Here, however, we aim at providing conceptual insight in the principles of genome-wide quantitative transcriptomic and epigenomic (including open chromatin assay) data analysis by describing a generic workflow. By starting from a general framework and its assumptions, the need for alternative or additional data-analytical solutions when working with specific data types becomes clear, and are hence introduced. Thus, we aim to enable readers with basic omics expertise to deepen their conceptual and statistical understanding of general strategies and pitfalls in omics data analysis and to facilitate subsequent progression to more specialized literature.\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":" 670","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139787184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative transcriptomic and epigenomic data analysis: a primer","authors":"Louis Coussement, Wim Van Criekinge, Tim de Meyer","doi":"10.1093/bioadv/vbae019","DOIUrl":"https://doi.org/10.1093/bioadv/vbae019","url":null,"abstract":"\u0000 \u0000 \u0000 The advent of microarray and second generation sequencing technology has revolutionized the field of molecular biology, allowing researchers to quantitatively assess transcriptomic and epigenomic features in a comprehensive and cost-efficient manner. Moreover, technical advancements have pushed the resolution of these sequencing techniques to the single cell level. As a result, the bottleneck of molecular biology research has shifted from the bench to the subsequent omics data analysis. Even though most methodologies share the same general strategy, state-of-the-art literature typically focuses on data type specific approaches and already assumes expert knowledge. Here, however, we aim at providing conceptual insight in the principles of genome-wide quantitative transcriptomic and epigenomic (including open chromatin assay) data analysis by describing a generic workflow. By starting from a general framework and its assumptions, the need for alternative or additional data-analytical solutions when working with specific data types becomes clear, and are hence introduced. Thus, we aim to enable readers with basic omics expertise to deepen their conceptual and statistical understanding of general strategies and pitfalls in omics data analysis and to facilitate subsequent progression to more specialized literature.\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139846887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
April Shen, Marcos Casado Barbero, Baron Koylass, Kirill Tsukanov, Tim Cezard, Thomas M Keane
{"title":"CMAT: ClinVar Mapping and Annotation Toolkit","authors":"April Shen, Marcos Casado Barbero, Baron Koylass, Kirill Tsukanov, Tim Cezard, Thomas M Keane","doi":"10.1093/bioadv/vbae018","DOIUrl":"https://doi.org/10.1093/bioadv/vbae018","url":null,"abstract":"\u0000 \u0000 \u0000 Semantic ontology mapping of clinical descriptors with disease outcome is essential. ClinVar is a key resource for human variation with known clinical significance. We present CMAT, a software toolkit and curation protocol for accurately enriching ClinVar releases with disease ontology associations and complex functional consequences.\u0000 \u0000 \u0000 \u0000 The software and ontology mappings can be obtained from: https://github.com/EBIvariation/CMAT\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"9 38","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139795050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}