Bioinformatics advancesPub Date : 2026-04-13eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag107
Gal Gilad, Roded Sharan
{"title":"An optimization framework for hierarchical clustering.","authors":"Gal Gilad, Roded Sharan","doi":"10.1093/bioadv/vbag107","DOIUrl":"https://doi.org/10.1093/bioadv/vbag107","url":null,"abstract":"<p><strong>Motivation: </strong>Hierarchical clustering is a fundamental problem in computational biology, with popular greedy heuristics such as average linkage dating back to the 1950s but no well-defined objective. Recently, a combinatorial optimization criterion for the problem was suggested by Dasgupta. While minimizing this criterion is NP-hard, the popular average linkage method serves as a strong baseline. Nevertheless, its myopic, greedy nature frequently leads to structurally suboptimal hierarchies.</p><p><strong>Results: </strong>To remedy this, we introduce a novel average-linkage-based clustering approach that combines local and global considerations by generating multiple views of the input data and learning how to blend them into an integrated similarity measure. We demonstrate that our method, DOMUS, consistently outperforms strong baselines, including a beam search heuristic, on a wide range of synthetic and classic benchmark datasets. Furthermore, we validate its real-world applicability through a rigorous benchmark on single-cell RNA sequencing data, where it compares favorably with the state-of-the-art HiDeF algorithm.</p><p><strong>Availability and implementation: </strong>The DOMUS framework is implemented in Python and freely available at https://github.com/GalGilad/DOMUS.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag107"},"PeriodicalIF":2.8,"publicationDate":"2026-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13128330/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147824163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-09eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag101
Roman Kouznetsov, Jackson Loper, Jeffrey Regier
{"title":"Graph convolutional networks for inferring cell-cell communication from spatial transcriptomics data.","authors":"Roman Kouznetsov, Jackson Loper, Jeffrey Regier","doi":"10.1093/bioadv/vbag101","DOIUrl":"https://doi.org/10.1093/bioadv/vbag101","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell spatial transcriptomics provides gene expression measurements of individual cells while preserving their spatial positions within tissue. Cell-cell communication (CCC) can be inferred by comparing the predictions of held-out gene expression levels by a pair of models: one that incorporates cellular neighborhood information and another that does not. The performance gap indicates the influence of CCC. However, existing methods that adopt this general approach often rely on spatially informed models that use simplistic representations of spatial context. This reliance on such representations does not merely lead to suboptimal predictions: it undermines the validity of the model comparison itself, which hinges on the accurate estimation of conditional expectations.</p><p><strong>Results: </strong>We propose using a graph convolutional network (GCN) as a highly expressive spatially informed model, with cells as nodes and spatial proximity as edges. In semi-synthetic datasets, we show that several existing approaches relying on simplistic neighborhood features can produce spurious inferences about CCC, whereas our GCN-based approach avoids these pitfalls. In MERFISH and Xenium mouse brain tissue, our method identifies genes with known spatial variation, suggesting that it successfully infers CCC-affected genes.</p><p><strong>Availability and implementation: </strong>Code to reproduce our results is available from https://github.com/prob-ml/spice.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag101"},"PeriodicalIF":2.8,"publicationDate":"2026-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13110010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147790551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-03eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag094
Rosaria Tornisiello, Helene Kretzmer
{"title":"scGeno: a Hidden Markov Model approach to denoise chromosome-scale genotypes from single-cell data.","authors":"Rosaria Tornisiello, Helene Kretzmer","doi":"10.1093/bioadv/vbag094","DOIUrl":"https://doi.org/10.1093/bioadv/vbag094","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell analysis of monoallelic expression and genomic imprinting requires accurate genotype determination at the cellular level. However, genotype inference from single-cell RNA sequencing data is challenging due to technical noise, allelic dropout, and sparse gene expression patterns, particularly in genetically heterogeneous populations.</p><p><strong>Results: </strong>Here, we present scGeno, a categorical Hidden Markov Model that infers chromosome-level genotype states in organisms with mixed genotypes by modeling sequential gene expression ratios from single-cell RNA sequencing data. Our method leverages the sequential continuity of the genotype states along chromosomes to overcome single-cell data limitations and generates chromosome-resolved, comprehensive genotype maps for individual samples. Our probabilistic framework accounts for technical noise while maintaining high accuracy in genotype assignment. Validation on experimental data demonstrates robust performance in determining clear genotypic states, thereby enabling systematic investigation of allele-specific expression patterns at single-cell resolution.</p><p><strong>Availability and implementation: </strong>scGeno is an open-source Python package under an MIT license. Source code, documentation, and installation instructions can be downloaded from GitHub (https://github.com/RosariaTornisiello/Genotype_HMM.git).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag094"},"PeriodicalIF":2.8,"publicationDate":"2026-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13075984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147694163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-03-30eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag095
Luis U Aguilera, William S Raymond, Rhiannon M Sears, Nathan L Nowling, Brian Munsky, Ning Zhao
{"title":"MicroLive: an image processing toolkit for quantifying live-cell single-molecule microscopy.","authors":"Luis U Aguilera, William S Raymond, Rhiannon M Sears, Nathan L Nowling, Brian Munsky, Ning Zhao","doi":"10.1093/bioadv/vbag095","DOIUrl":"10.1093/bioadv/vbag095","url":null,"abstract":"<p><strong>Motivation: </strong>Advances in live-cell fluorescence microscopy have enabled us to visualize single molecules (such as mRNAs and nascent proteins) in real time with high spatiotemporal resolution. However, these experiments generate large datasets that require complex computational processing pipelines to derive meaningful and quantitative information, which is a technical barrier for many researchers.</p><p><strong>Results: </strong>Here, we introduce MicroLive, an open-source Python-based application for quantifying live-cell microscopy images. MicroLive provides an interactive Graphical User Interface (GUI) to perform key tasks, including cell segmentation, photobleaching correction, single-particle detection/tracking, spot intensity quantification, inter-channel colocalization, and time-series correlation analysis. As a ground-truth testing dataset, we used synthetic live-cell imaging data generated with the rSNAPed toolkit, demonstrating accurate extraction of biologically relevant parameters. Microscopy images of U-2 OS cells expressing a gene construct smHA-KDM5B-BoxB-MS2 were used to demonstrate the use of this software.</p><p><strong>Availability and implementation: </strong>MicroLive is distributed under a GPLv3 license and available on GitHub https://github.com/ningzhaoAnschutz/microlive. It can be installed via pip: pip install microlive.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag095"},"PeriodicalIF":2.8,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13080936/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147700764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-03-28eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag092
Ahmed Salah, Sebastian Zahnreich, Federico Marini
{"title":"DoReMiTra: an R/Bioconductor data package for orchestrating the analysis of radiation transcriptomic studies.","authors":"Ahmed Salah, Sebastian Zahnreich, Federico Marini","doi":"10.1093/bioadv/vbag092","DOIUrl":"https://doi.org/10.1093/bioadv/vbag092","url":null,"abstract":"<p><strong>Summary: </strong>Understanding the molecular impact of ionizing radiation exposure is essential for both biomedical research and public health. Among the possible approaches to study this phenomenon, gene expression profiling via transcriptomics assays has been a valuable approach over the last decades to unravel the mechanisms of cellular responses to radiation. To our knowledge, there is no data package gathering well-curated radiation transcriptomic datasets covering microarrays and, more recently, RNA sequencing. Therefore, we present DoReMiTra, an R/Bioconductor data package that represents the first unified radiation transcriptomics dataset collection integrated with Bioconductor's ExperimentHub for efficient distribution. DoReMiTra standardizes and harmonizes sample-level metadata and provides pre-processed SummarizedExperiment objects to facilitate comparative analyses. Additionally, we introduce a lightweight Shiny app interface for interactive visualization and preliminary exploration. DoReMiTra serves as a valuable resource and tool in radiation research for benchmarking, integrative analyses, and biomarker discovery.</p><p><strong>Availability and implementation: </strong>DoReMiTra is available under the MIT license at https://bioconductor.org/packages/DoReMiTra.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag092"},"PeriodicalIF":2.8,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13143432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-03-28eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag093
Elif Kardelen Çağdaş, Hüseyin Şan, Berkay Çağdaş, Emre Hafızoğlu
{"title":"From pathways to prediction: a comparative machine learning framework for prostate cancer survival.","authors":"Elif Kardelen Çağdaş, Hüseyin Şan, Berkay Çağdaş, Emre Hafızoğlu","doi":"10.1093/bioadv/vbag093","DOIUrl":"10.1093/bioadv/vbag093","url":null,"abstract":"<p><strong>Motivation: </strong>Prostate cancer shows substantial clinical and molecular heterogeneity, limiting the prognostic accuracy of conventional clinicopathologic models. Single-gene alterations and tumor mutational burden provide limited prognostic discrimination. Pathway-level genomic abstraction may better capture cumulative oncogenic disruption.</p><p><strong>Results: </strong>Genomic and clinical data from 2231 prostate adenocarcinoma patients were analyzed by mapping somatic mutations to 11 cancer-related signaling pathways. A composite pathway-based risk score integrating pathway burden, p53 pathway status, and high-risk co-alterations was developed and evaluated using survival analysis, Cox regression, time-dependent receiver operating characteristic curves, and machine-learning models, with generalizability assessed in an independent external cohort. The score stratified patients into distinct risk groups with significantly different overall survival (log-rank P < .0001); each one-point increase was associated with a 31% higher mortality risk (hazard ratio 1.31, 95% confidence interval 1.21-1.42). The model showed moderate discrimination (concordance index 0.5897) and more stable predictive performance than tumor mutational burden alone. Machine-learning models achieved similar performance, and feature importance analysis identified p53 pathway disruption and pathway burden as key predictors. The proposed framework is a mutation-based genomic risk-stratification tool derived from targeted-sequencing data that provides interpretable prognostic stratification with performance comparable to machine-learning models.</p><p><strong>Availability and implementation: </strong>Available upon request.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag093"},"PeriodicalIF":2.8,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13091648/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147724900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-03-26eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag089
Hanna Lee, Denalda Gashi, Syeda Maheen Batool, Ana K Escobedo, Allegra A Petti, Bob S Carter, Leonora Balaj
{"title":"m6AnetAnalyzer: an R toolkit for post-processing of m6A sites detected by m6Anet.","authors":"Hanna Lee, Denalda Gashi, Syeda Maheen Batool, Ana K Escobedo, Allegra A Petti, Bob S Carter, Leonora Balaj","doi":"10.1093/bioadv/vbag089","DOIUrl":"10.1093/bioadv/vbag089","url":null,"abstract":"<p><p>m6AnetAnalyzer is an R package that streamlines post-processing and interpretation of site-level m6A predictions from m6Anet. It summarizes m6A distributions across transcripts, genes, biotypes, and transcript regions, and enables functional annotation using user-provided BED files or built-in datasets, including RNA-binding proteins and SNPs. Condition-specific changes in m6A methylation are quantified using the log2-transformed weighted modification ratio, with statistical tests applied when appropriate to identify significant differential methylation. By integrating differential gene expression data, m6AnetAnalyzer links methylation changes with expression differences, offering biotype- and region-specific insights into how m6A localization patterns relate to transcriptional regulation. <b>Availability</b> <b>and implementation</b> m6AnetAnalyzer is freely available at https://github.com/hannalee809/m6AnetAnalyzer. It is compatible with Linux, macOS, and Windows platforms. Detailed installation instructions, example input and output files, and a step-by-step analysis workflow are provided in the package vignette.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag089"},"PeriodicalIF":2.8,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13069878/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147678761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-03-26eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag087
Md Muhaiminul Islam Nafi, Khandokar Md Rahat Hossain
{"title":"DeepBCTPred: deep learning-based prediction of bladder cancer tissues from endoscopic images.","authors":"Md Muhaiminul Islam Nafi, Khandokar Md Rahat Hossain","doi":"10.1093/bioadv/vbag087","DOIUrl":"https://doi.org/10.1093/bioadv/vbag087","url":null,"abstract":"<p><strong>Motivation: </strong>Bladder cancer is one of the most prevalent malignancies worldwide, affecting the tissues of the urinary bladder and posing a significant threat to patient survival and quality of life. Accurate classification of bladder cancer tissue is critical for early diagnosis and patient survival, yet conventional methods suffer from subjective interpretation and human error.</p><p><strong>Results: </strong>We propose DeepBCTPred, a novel deep learning framework that integrates handcrafted and learned features through a dual-branch architecture combining MobileNetV3 with a Feedforward Neural Network. Our approach incorporates Recursive Feature Elimination (RFE) for feature selection and employs a genetic algorithm-based image generation pipeline for optimal data selection. DeepBCTPred achieved superior performance with 98.74% recall, 99.45% specificity, and 97.96% F1-score on the test dataset, significantly outperforming existing state-of-the-art methods, achieving improvements ranging from 2% to 15% in recall, 1.3%-13.1% in F1-score, and 1.5%-16% in Matthews Correlation Coefficient (MCC). This framework demonstrates strong potential for clinical implementation in bladder cancer diagnosis and may be extensible to other cancer types for enhanced precision medicine applications.</p><p><strong>Availability and implementation: </strong>The training, validation, and test scripts are freely available at https://github.com/nafcoder/DeepBCTPred.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag087"},"PeriodicalIF":2.8,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13140682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-03-25eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag047
Niklas Brunn, Maren Hackenberg, Camila L Fullio, Tanja Vogel, Harald Binder
{"title":"Sparse dimensionality reduction for analyzing single-cell-resolved interactions.","authors":"Niklas Brunn, Maren Hackenberg, Camila L Fullio, Tanja Vogel, Harald Binder","doi":"10.1093/bioadv/vbag047","DOIUrl":"10.1093/bioadv/vbag047","url":null,"abstract":"<p><strong>Summary: </strong>Several approaches have been proposed to reconstruct interactions between groups of cells or individual cells from single-cell transcriptomics data, leveraging prior information about known ligand-receptor interactions. To enhance downstream analyses, we present an end-to-end dimensionality reduction workflow, specifically tailored for single-cell cell-cell interaction data. In particular, we demonstrate that sparse dimensionality reduction can pinpoint specific ligand-receptor interactions in relation to clusters of cell pairs. For sparse dimensionality reduction, we focus on the Boosting Autoencoder approach. Overall, we provide a comprehensive workflow, including result visualization, that simplifies the analysis of interaction patterns in cell pairs. This is supported by a Jupyter notebook that can readily be adapted to different datasets.</p><p><strong>Availability and implementation: </strong>https://github.com/NiklasBrunn/Sparse-dimension-reduction.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag047"},"PeriodicalIF":2.8,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13014469/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147522848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-03-21eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag086
Austin Chiao, Benjamin Crysup, Jonathan L King, Michael D Coble, August E Woerner
{"title":"Machine learning improves SNP microarray performance in challenged samples.","authors":"Austin Chiao, Benjamin Crysup, Jonathan L King, Michael D Coble, August E Woerner","doi":"10.1093/bioadv/vbag086","DOIUrl":"10.1093/bioadv/vbag086","url":null,"abstract":"<p><p><i></i> SNP microarrays provide a cost-effective genotyping method used in various scientific disciplines. Sample costs vary from tens to hundreds of dollars, storage costs are comparatively reasonable, and analysis methods easily scale to large sample sizes. However, microarrays are designed to be used with high quality samples rather than low-quantity DNA inputs. To deal with this, when working with challenged samples uncertainty must be properly accounted for. Rather than calling crisp genotypes when data are uncertain, it is better to represent them probabilistically. This approach can cleanly feed into tools that directly consider likelihoods while remaining compatible with tools expecting hard calls by removing uncertain genotype calls. Several machine learning algorithms were used to estimate genotypes and genotype likelihoods generated from Illumina Omni5-4 microarray data, and the results were compared. While neural networks and XGBoost were both performant, XGBoost appears to generalize better across sample types generated on the Omni5-4 chips (generalization between technologies awaits further examination). Further, it can more directly produce an estimate of genotype quality (as opposed to scores), a feature that has been lacking in microarray analysis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag086"},"PeriodicalIF":2.8,"publicationDate":"2026-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13091614/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147724870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}