Bioinformatics advancesPub Date : 2025-04-23eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf093
Tram N Nguyen, Tyrone Lee, Nitesh Turaga, Robert Gentleman, Ludwig Geistlinger, Martin Morgan
{"title":"AlphaMissenseR: an integrated framework for investigating missense mutations in human protein-coding genes.","authors":"Tram N Nguyen, Tyrone Lee, Nitesh Turaga, Robert Gentleman, Ludwig Geistlinger, Martin Morgan","doi":"10.1093/bioadv/vbaf093","DOIUrl":"10.1093/bioadv/vbaf093","url":null,"abstract":"<p><strong>Summary: </strong>AlphaMissense is an AI model from Google DeepMind that predicts the pathogenicity of every possible missense mutation in the human proteome. We present AlphaMissenseR, an R/Bioconductor package that facilitates performant and reproducible access to these predictions and that provides functionality for analysis, visualization, validation, and benchmarking. AlphaMissenseR integrates with Bioconductor facilities for genomic region analysis, and provides multi-level visualization and interactive exploration of variant pathogenicity in a genome browser and on 3D protein structures. In addition, AlphaMissenseR integrates with major clinical and experimental variant databases for contrasting predicted and clinically derived pathogenicity scores, and for systematic benchmarking of existing and new variant effect prediction methods across a large collection of deep mutational scanning assays.</p><p><strong>Availability and implementation: </strong>AlphaMissense data resources are distributed under the CC-BY 4.0 license and the AlphaMissenseR package is available from Bioconductor (https://bioconductor.org/packages/AlphaMissenseR) under the Artistic 2.0 license.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf093"},"PeriodicalIF":2.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HORNET: tools to find genes with causal evidence and their regulatory networks using eQTLs.","authors":"Noah Lorincz-Comi, Yihe Yang, Jayakrishnan Ajayakumar, Makaela Mews, Valentina Bermudez, William Bush, Xiaofeng Zhu","doi":"10.1093/bioadv/vbaf068","DOIUrl":"10.1093/bioadv/vbaf068","url":null,"abstract":"<p><strong>Motivation: </strong>Nearly two decades of genome-wide association studies (GWAS) have identify thousands of disease-associated genetic variants, but very few genes with evidence of causality. Recent methodological advances demonstrate that Mendelian randomization (MR) using expression quantitative loci (eQTLs) as instrumental variables can detect potential causal genes. However, existing MR approaches are not well suited to handle the complexity of eQTL GWAS data structure and so they are subject to bias, inflation, and incorrect inference.</p><p><strong>Results: </strong>We present a whole-genome regulatory network analysis tool (HORNET), which is a comprehensive set of statistical and computational tools to perform genome-wide searches for causal genes using summary level GWAS data, i.e. robust to biases from multiple sources. Applying HORNET to schizophrenia, eQTL effects in the cerebellum were spread throughout the genome, and in the cortex were more localized to select loci.</p><p><strong>Availability and implementation: </strong>Freely available at https://github.com/noahlorinczcomi/HORNET or Mac, Windows, and Linux users.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf068"},"PeriodicalIF":2.4,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12014422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144012382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-04-17eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf078
Jiru Han, Zachary F Gerring, Longfei Wang, Melanie Bahlo
{"title":"GeneSetPheno: a web application for the integration, summary, and visualization of gene and variant-phenotype associations across gene sets.","authors":"Jiru Han, Zachary F Gerring, Longfei Wang, Melanie Bahlo","doi":"10.1093/bioadv/vbaf078","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf078","url":null,"abstract":"<p><strong>Motivation: </strong>The comprehensive study of genotype-phenotype relationships requires the integration of multiple data types to \"triangulate\" signals and derive meaningful biological conclusions. Large-scale biobanks and public resources generate a wealth of comprehensive results, facilitating the discovery of associations between genes or genetic variants and multiple phenotypes. However, analyzing these data across resources presents several challenges, including limited flexibility in gene set analysis, the integration of multipe databases, and the need for effective data visualization to aid interpretation.</p><p><strong>Results: </strong>GeneSetPheno is a user-friendly graphical interface that integrates, summarizes, and visualizes gene and variant-phenotype associations across genomic resources. It allows users to explore interrelationships between genetic variants and phenotypes, offering insights into the genetic factors driving phenotypic variation within user-defined gene sets. GeneSetPheno also supports comparisons across gene sets to identify shared or unique genetic variants, phenotypic associations, biological pathways, and potential gene-gene interactions. GeneSetPheno is a free and highly configurable tool for exploring the complex relationships between gene sets, genetic variants, and phenotypes. Target users include molecular biologists and clinicians who wish to explore a gene or gene set of particular interest.</p><p><strong>Availability and implementation: </strong>GeneSetPheno is freely accessible at: https://shiny.wehi.edu.au/han.ji/GeneSetPheno/. The source code is available on GitHub at: https://github.com/bahlolab/GeneSetPheno.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf078"},"PeriodicalIF":2.4,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12011357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144012367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-04-16eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf087
Luciana Martins, João Capela, Emanuel Cunha, Marta Sampaio, Oscar Dias
{"title":"diel_models: a python package for systematic integration of day-night cycles into plant genome-scale metabolic models.","authors":"Luciana Martins, João Capela, Emanuel Cunha, Marta Sampaio, Oscar Dias","doi":"10.1093/bioadv/vbaf087","DOIUrl":"10.1093/bioadv/vbaf087","url":null,"abstract":"<p><strong>Summary: </strong>In recent years, genome-scale metabolic models have become indispensable tools for studying complex metabolic processes occurring within living organisms. Understanding plants' metabolic behaviour under diel cycles (24-h day-night cycles) is essential to explain their adaptive strategies to different light conditions. However, integrating these cycles in plant GEMs is complex, laborious, time-consuming, and not systematized. Here, we present <i>diel_models</i>, a novel python package that enables the systematization and accurate construction of diel models based on non-diel plant GEMs, tailored for generic and multi-tissue models. <i>diel_models</i> is a lightweight, modular package with minimal dependencies and broad Python compatibility (v3.8+), making it easy to use, integrate into reconstruction pipelines, and extend with community-driven enhancements. It is also supported on all operating systems, including Windows, MacOS, and Linux, ensuring cross-platform compatibility for a wide range of users.</p><p><strong>Availability and implementation: </strong>The code is freely available at https://github.com/BioSystemsUM/diel_models.git and can be installed using the command pip install diel_models.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf087"},"PeriodicalIF":2.4,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070391/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144036699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Space: reconciling multiple spatial domain identification algorithms via consensus clustering.","authors":"Daoliang Zhang, Wenrui Li, Xinyi Sui, Na Yu, Shan Wang, Zhiping Liu, Xiaowo Wang, Zhiyuan Yuan, Rui Gao, Wei Zhang","doi":"10.1093/bioadv/vbaf084","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf084","url":null,"abstract":"<p><strong>Motivation: </strong>The rapid development of spatially resolved transcriptomics (SRT) technologies has provided unprecedented opportunities for characterizing and understanding tissue architecture. As this field continues to advance, various methods have been developed to computationally identify spatial domains within tissues. However, the performance of different algorithms on the same dataset is not always consistent. This inconsistency makes it difficult for researchers to select the most reliable results for downstream analysis.</p><p><strong>Results: </strong>To address this challenge, we propose a domain identification method named Space. Space measures consistency between different methods to select reliable algorithms. It then constructs a consensus matrix to integrate the outputs from multiple algorithms. We introduce similarity loss, spatial loss, and low-rank loss in Space to enhance the accuracy and optimize computational efficiency. This strategy not only resolves the inconsistent issue of clustering labels among different methods but also achieves highly reliable clustering output. Flexible interfaces are also provided for downstream analysis such as visualization, domain-specific gene analysis and trajectory inference. Testing results on multiple publicly available SRT datasets demonstrate that Space performs exceptionally well in deciphering key tissue structures and biological features.</p><p><strong>Availability and implementation: </strong>The Space package can be easily installed through conda or mamba, and its source code is available at https://honchkrow.github.io/Space.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf084"},"PeriodicalIF":2.4,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12037102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-04-11eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf088
Boris Minasenko, Dongxue Wang, Piera Cirillo, Nickilou Krigbaum, Barbara Cohn, Dean P Jones, Jeffrey M Collins, Xin Hu
{"title":"Rodin: a streamlined metabolomics data analysis and visualization tool.","authors":"Boris Minasenko, Dongxue Wang, Piera Cirillo, Nickilou Krigbaum, Barbara Cohn, Dean P Jones, Jeffrey M Collins, Xin Hu","doi":"10.1093/bioadv/vbaf088","DOIUrl":"10.1093/bioadv/vbaf088","url":null,"abstract":"<p><strong>Summary: </strong>Recent advances in high-resolution mass spectrometry have revolutionized metabolomics, enabling the profiling of hundreds of thousands of metabolic features in a single experiment, with widespread applications across health sciences. To streamline analysis of metabolomics data, we developed Rodin, a Python-based application offering fast, efficient processing of large datasets via a web interface or programming library. Rodin integrates multiple stages of analysis, including feature preprocessing, statistical testing, interactive visualizations, and pathway analysis, generating outputs while tracking user-defined parameters within a single page. By enhancing the accessibility of tools for metabolomics data analysis, Rodin not only streamlines the workflow but also enhances analytic throughput by enabling a broader range of users to perform these analyses. Compared to other tools, Rodin excels in user-friendliness, ease of access, and seamless integration of multiple functionalities, enabling reproducible, efficient workflows for users of all computational skill levels.</p><p><strong>Availability and implementation: </strong>Web interface-https://rodin-meta.com/. Python library-https://github.com/BM-Boris/rodin.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf088"},"PeriodicalIF":2.4,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-04-10eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf080
Sneha Mitra, Alexander J Hartemink
{"title":"Inferring differential protein binding from time-series chromatin accessibility data.","authors":"Sneha Mitra, Alexander J Hartemink","doi":"10.1093/bioadv/vbaf080","DOIUrl":"10.1093/bioadv/vbaf080","url":null,"abstract":"<p><strong>Motivation: </strong>Due to internal and external factors, the epigenomic landscape is constantly changing in ways that are linked to changes in gene expression. Chromatin accessibility data, such as MNase-seq, provide valuable insights into this landscape and have been used to compute chromatin occupancy profiles. Multiple datasets generated over time or under different conditions can thus be used to study dynamic changes in chromatin occupancy across the genome.</p><p><strong>Results: </strong>Our existing model, RoboCOP, computes a genome-wide chromatin occupancy profile for nucleosomes and hundreds of transcription factors. Here, we present a new method called DynaCOP that takes multiple chromatin occupancy profiles and uses them to generate a series of nucleosome-guided difference profiles. These profiles identify differentially binding transcription factors and reveal changes in nucleosome occupancy and positioning. We apply DynaCOP to chromatin occupancy profiles derived from deeply sequenced time-series MNase-seq data to study differential chromatin occupancy in the yeast genome under cadmium stress. We find strong correlations between the observed chromatin changes and changes in transcription.</p><p><strong>Availability and implementation: </strong>https://github.com/HarteminkLab/RoboCOP.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf080"},"PeriodicalIF":2.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12037103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144008364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-04-10eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf071
Francisco M De La Vega, Sean A Irvine, Pavana Anur, Kelly Potts, Lewis Kraft, Raul Torres, Peter Kang, Sean Truong, Yeonghun Lee, Shunhua Han, Vitor Onuchic, James Han
{"title":"Benchmarking of germline copy number variant callers from whole genome sequencing data for clinical applications.","authors":"Francisco M De La Vega, Sean A Irvine, Pavana Anur, Kelly Potts, Lewis Kraft, Raul Torres, Peter Kang, Sean Truong, Yeonghun Lee, Shunhua Han, Vitor Onuchic, James Han","doi":"10.1093/bioadv/vbaf071","DOIUrl":"10.1093/bioadv/vbaf071","url":null,"abstract":"<p><strong>Motivation: </strong>Whole-genome sequencing (WGS) is increasingly preferred for clinical applications due to its comprehensive coverage, effectiveness in detecting copy number variants (CNVs), and declining costs. However, systematic evaluations of WGS CNV callers tailored to germline clinical testing-where high sensitivity and confirmation of reported CNVs are essential-remain necessary. Clinical reporting typically emphasizes CNVs affecting coding regions over precise breakpoint detection. This study benchmarks several short-read WGS CNV detection tools using reference cell lines to inform their clinical use.</p><p><strong>Results: </strong>While tools vary in sensitivity (7%-83%) and precision (1%-76%), few meet the sensitivity needed for clinical testing. Callers generally perform better for deletions (up to 88% sensitivity) than duplications (up to 47% sensitivity), with poor detection of duplications under 5 kb. Notably, for CNVs in genes commonly included in clinical panels, significantly improved sensitivity and precision were observed when benchmarking against 25 cell lines with known CNVs. DRAGEN v4.2 high-sensitivity CNV calls, post-processed with custom filters, achieved 100% sensitivity and 77% precision on the optimized gene panel after excluding recurring artifacts. This level of performance may support clinical use with orthogonal confirmation of reportable CNVs, pending validation on laboratory-specific samples.</p><p><strong>Availability and implementation: </strong>The data underlying this article are available in the European Nucleo-tide Archive under project accession PRJEB87628.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf071"},"PeriodicalIF":2.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005901/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144031521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-04-10eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf081
Pengyao Ping, Jinyan Li
{"title":"Construction of edit-distance graphs for large sets of short reads through minimizer-bucketing.","authors":"Pengyao Ping, Jinyan Li","doi":"10.1093/bioadv/vbaf081","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf081","url":null,"abstract":"<p><strong>Motivation: </strong>Pairs of short reads with small edit distances, along with their unique molecular identifier tags, have been exploited to correct sequencing errors in both reads and tags. However, brute-force identification of these pairs is impractical for large datasets containing ten million or more reads due to its quadratic complexity. Minimizer-bucketing and locality-sensitive hashing have been used to partition read sets into buckets of similar reads, allowing edit-distance calculations only within each bucket. However, challenges like minimizing missing pairs, optimizing bucketing parameters, and exploring combination bucketing to improve pair detection remain.</p><p><strong>Results: </strong>We define an edit-distance graph for a set of short reads, where nodes represent reads, and edges connect reads with small edit distances, and present a heuristic method, reads2graph, for high completeness of edge detection. Reads2graph uses three techniques: minimizer-bucketing, an improved Order-Min-Hash technique to divide large bins, and a novel graph neighbourhood multi-hop traversal within large bins to detect more edges. We then establish optimal bucketing settings to maximize ground truth edge coverage per bin. Extensive testing demonstrates that read2graph can achieve 97%-100% completeness in most cases, outperforming brute-force identification in speed while providing a superior speed-completeness balance compared to using a single bucketing method like Miniception or Order-Min-Hash.</p><p><strong>Availability and implementation: </strong>reads2graph is publicly available at https://github.com/JappyPing/reads2graph.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf081"},"PeriodicalIF":2.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144057920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-04-09eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf065
{"title":"Correction to: RecGOBD: accurate recognition of gene ontology related brain development protein functions through multi-feature fusion and attention mechanisms.","authors":"","doi":"10.1093/bioadv/vbaf065","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf065","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/bioadv/vbae163.].</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf065"},"PeriodicalIF":2.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11981713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144030060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}