{"title":"acmgscaler: an R package and Colab for standardized gene-level variant effect score calibration within the ACMG/AMP framework.","authors":"Mihaly Badonyi, Joseph A Marsh","doi":"10.1093/bioinformatics/btaf503","DOIUrl":"10.1093/bioinformatics/btaf503","url":null,"abstract":"<p><strong>Motivation: </strong>A genome-wide variant effect calibration method was recently developed under the guidelines of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP), following ClinGen recommendations for variant classification. While genome-wide approaches offer clinical utility, emerging evidence highlights the need for gene- and context-specific calibration to improve accuracy. Building on previous work, we have developed an algorithm tailored to converting functional scores from both multiplexed assays of variant effects (MAVEs) and computational variant effect predictors (VEPs) into ACMG/AMP evidence strengths.</p><p><strong>Results: </strong>Our method is designed to deliver consistent performance across different genes and score distributions, with all variables adaptively determined from the input data, preventing selective adjustments or overfitting that could inflate evidence strengths beyond empirical support. To facilitate adoption, we introduce acmgscaler, a lightweight R package and a plug-and-play Google Colab notebook for the calibration of custom datasets. This algorithmic framework bridges the gap between MAVEs/VEPs and clinically actionable variant classification.</p><p><strong>Availability and implementation: </strong>The R package and Colab notebook are available at https://github.com/badonyi/acmgscaler.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12496131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145034781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FootprintCharter: unsupervised detection and quantification of footprints in single molecule footprinting data.","authors":"Guido Barzaghi, Arnaud R Krebs, Judith B Zaugg","doi":"10.1093/bioinformatics/btaf502","DOIUrl":"10.1093/bioinformatics/btaf502","url":null,"abstract":"<p><strong>Summary: </strong>Single molecule footprinting profiles the heterogeneity of TF occupancy at cis-regulatory elements across cell populations at unprecedented resolution. The single molecule nature of the data in principle allows for observing the footprint of individual transcription factors and nucleosomes. However, we currently lack algorithms to quantify these occupancy patterns of chromatin binding factors in an automated way and without prior assumptions on their genomic location. Here we present FootprintCharter, an unsupervised tool to detect and quantify footprints for transcription factors (TFs) and nucleosomes from single molecule footprinting data. After detection, TF footprints can be labeled with orthogonal motif annotations provided by the user. FootprintCharter allows for the quantification of complex molecular states such as positioning of unphased nucleosomes and combinatorial co-binding of multiple TFs.</p><p><strong>Availability and implementation: </strong>FootprintCharter is freely available on Bioconductor with version 2.2.0 of https://bioconductor.org/packages/SingleMoleculeFootprinting through the functions FootprintCharter, PlotFootprints, and Plot_FootprintCharter_SM.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SKiM: accurately classifying metagenomic ONT reads in limited memory.","authors":"Trevor Schneggenburger, Jaroslaw Zola","doi":"10.1093/bioinformatics/btaf537","DOIUrl":"10.1093/bioinformatics/btaf537","url":null,"abstract":"<p><strong>Motivation: </strong>Oxford Nanopore Technologies' devices, such as MinION, permit affordable, real-time DNA sequencing, and come with targeted sequencing capabilities. Such capabilities create new challenges for metagenomic classifiers that must be computationally efficient yet robust enough to handle potentially erroneous DNA reads, while ideally inspecting only a few hundred bases of a read. Currently available DNA classifiers leave room for improvement with respect to classification accuracy, memory usage, and the ability to operate in targeted sequencing scenarios.</p><p><strong>Results: </strong>We present SKiM: Short K-mers in Metagenomics, a new lightweight metagenomic classifier designed for ONT reads. Compared to state-of-the-art classifiers, SKiM requires only a fraction of memory to run, and can classify DNA reads with higher accuracy after inspecting only their first few hundred bases. To achieve this, SKiM introduces new data compression techniques to maintain a reference database built from short k-mers, and treats classification as a statistical testing problem.</p><p><strong>Availability and implementation: </strong>SKiM source code, documentation, and test data are available from: https://gitlab.com/SCoRe-Group/skim.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün
{"title":"Privacy-preserving federated unsupervised domain adaptation with application to age prediction from DNA methylation data.","authors":"Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün","doi":"10.1093/bioinformatics/btaf465","DOIUrl":"10.1093/bioinformatics/btaf465","url":null,"abstract":"<p><strong>Motivation: </strong>Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.</p><p><strong>Results: </strong>Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.</p><p><strong>Availability and implementation: </strong>The source code for our method is available at https://github.com/mdppml/FREDA.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Armin Rauschenberger, Petr V Nazarov, Enrico Glaab
{"title":"Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation.","authors":"Armin Rauschenberger, Petr V Nazarov, Enrico Glaab","doi":"10.1093/bioinformatics/btaf406","DOIUrl":"10.1093/bioinformatics/btaf406","url":null,"abstract":"<p><strong>Method: </strong>Here, we propose a simple two-stage procedure for sharing information between related high-dimensional prediction or classification problems. In both stages, we perform sparse regression separately for each problem. While this is done without prior information in the first stage, we use the coefficients from the first stage as prior information for the second stage. Specifically, we designed feature-specific and sign-specific adaptive weights to share information on feature selection, effect directions, and effect sizes between different problems.</p><p><strong>Results: </strong>The proposed approach is applicable to multi-task learning as well as transfer learning. It provides sparse models (i.e. with few non-zero coefficients for each problem) that are easy to interpret. We show by simulation and application that it tends to select fewer features while achieving a similar predictive performance as compared to available methods.</p><p><strong>Availability and implementation: </strong>An implementation is available in the R package \"sparselink\" (https://github.com/rauschenberger/sparselink, https://cran.r-project.org/package=sparselink).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144661276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FUSION: a family-level integration approach for robust differential analysis of small non-coding RNAs.","authors":"Hukam C Rawal, Qi Chen, Tong Zhou","doi":"10.1093/bioinformatics/btaf526","DOIUrl":"10.1093/bioinformatics/btaf526","url":null,"abstract":"<p><strong>Motivation: </strong>Beyond well-studied microRNAs, noncanonical small non-coding RNAs (sncRNAs) derived from longer parental templates such as tRNAs, rRNAs, and Y RNAs, are emerging as important regulators in various biological processes and diseases. Yet, analyzing these noncanonical sncRNAs from sequencing data remains challenging due to the intrinsic sequence heterogeneity and highly noisy nature. Conventional strategies either sum up all sequencing reads mapped to a parental RNA, which sacrifices the resolution of single sncRNA species, or treat each unique RNA species/sequence independently, which faces substantial noise in low-replicate settings.</p><p><strong>Results: </strong>Here, we introduce FUSION (Family-level Unique Small RNA Integration), a computational tool bridging these conventional approaches by first quantifying unique sncRNA species and then aggregating them into their respective parental RNA families. This family-level integration captures the contributions of individual sncRNA species while enhancing statistical power and robustness for differential abundance analysis. FUSION includes two modules: FUSION_ms, which reduces noise and amplifies signals for multiple-sample comparison to detect family-level abundance changes even with a small sample size, and FUSION_ps, which is powered by paired-sample analysis and optimized for \"1-on-1\" differential abundance analysis in single-case studies. Both modules are validated by cross-lab discoveries of dysregulated sncRNA families that could not be identified using conventional methods. In summary, FUSION provides a powerful framework for sncRNA sequencing data analysis, enhancing data interpretation and supporting small sample research.</p><p><strong>Availability and implementation: </strong>FUSION is available at https://github.com/cozyrna/FUSION and archived at https://doi.org/10.5281/zenodo.16929712.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DiSTect: a Bayesian model for disease-associated gene discovery and prediction in spatial transcriptomics.","authors":"Qicheng Zhao, Anji Deng, Qihuang Zhang","doi":"10.1093/bioinformatics/btaf530","DOIUrl":"10.1093/bioinformatics/btaf530","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying disease-indicative genes is critical for deciphering disease mechanisms and has attracted significant interest in biomedical research. Spatial transcriptomics offers unprecedented insights for the detection of disease-associated genes by enabling within-tissue contrasts. However, this new technology poses challenges for conventional statistical models developed for RNA-sequencing, as these models often neglect the spatial corrleation of the disease status among tissue spots.</p><p><strong>Results: </strong>In this article, we propose DiSTect, a Bayesian shrinkage model to characterize the relationship between high-dimensional gene expressions and the disease status of each tissue spot, incorporating spatial correlation among these spots through autoregressive terms. Our model adopts a hierarchical structure to facilitate the analysis of multiple correlated samples and is further extended to accommodate the missing data within tissues. To ensure the model's applicability to datasets of varying sizes, we carry out two computational frameworks for Bayesian parameter estimation, tailored to both small and large sample scenarios. Simulation studies are conducted to evaluate the performance of the proposed model. The proposed model is applied to analyze the data arising from studies of HER2+ breast cancer and Alzheimer's disease.</p><p><strong>Availability and implementation: </strong>The dataset and source code are available on GitHub (https://github.com/StaGill/DiSTect) and Zenodo (https://zenodo.org/records/17127211).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcos López-De-Castro, Alberto García-Galindo, José González-Gomariz, Rubén Armañanzas
{"title":"Conformal inference for reliable single cell RNA-seq annotation.","authors":"Marcos López-De-Castro, Alberto García-Galindo, José González-Gomariz, Rubén Armañanzas","doi":"10.1093/bioinformatics/btaf521","DOIUrl":"10.1093/bioinformatics/btaf521","url":null,"abstract":"<p><strong>Motivation: </strong>Despite the inherent complexity associated to automatic cell type assignments, most supervised learning models overlook rigorous uncertainty quantification on the annotations. Although some existing pipelines incorporate rejection options under predefined circumstances, they usually rely on arbitrary assumptions and do not provide statistical guarantees. In this work, we propose a methodology based on the conformal prediction framework to provide reliable single-cell annotations. Conformal prediction provides statistical guarantees on the outcome predictions without making any assumption about the underlying distribution of the data. Our methodological proposal leverages conformal inference to address two critical challenges in single-cell RNA sequencing annotations: (i) detect out-of-distribution cell types in the query data; and, (ii) perform reliable uncertainty quantification of the cell annotations through well-calibrated prediction sets.</p><p><strong>Results: </strong>We evaluated the anomaly detector and the uncertainty-aware annotator in 10 batched experiments derived from various tissues. Specifically, we studied three different annotation taxonomies (standard, classwise, and cluster) alongside three different non-conformity measures. The results showed that our anomaly detector effectively identified previously unseen cell types, producing well-calibrated prediction sets. This rigorous annotation helped maintain coverage probabilities at the expected significance level. Finally, we illustrate how the integration of conformal prediction outputs enhanced further downstream analyses.</p><p><strong>Availability and implementation: </strong>The automatic scRNA-seq annotator is available at https://github.com/digital-medicine-research-group-UNAV/conformalized_single_cell_annotator and https://doi.org/10.5281/zenodo.15870599.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12506889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balancing Complexity and Clarity-Towards Clinician-Ready Antibiotic Resistance Prediction Models.","authors":"Dickson Aruhomukama","doi":"10.1093/bioinformatics/btaf556","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf556","url":null,"abstract":"<p><strong>Motivation: </strong>The escalating challenge of antibiotic resistance (ABR) demands clinician-ready machine learning models that are not only accurate but interpretable.</p><p><strong>Results: </strong>By treating resistance genes as independent features and augmenting them with curated single-nucleotide polymorphisms and contextual markers, this approach delivers scalable, transparent predictions aligned with clinical decision-making needs.</p><p><strong>Availability: </strong>Not applicable.</p><p><strong>Supplementary information: </strong>Not applicable.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinpu Li, Mauminah Raina, Yiqing Wang, Chunhui Xu, Li Su, Qi Guo, Ricardo Melo Ferreira, Michael T Eadon, Qin Ma, Juexin Wang, Dong Xu
{"title":"scBSP: A fast and accurate tool for identifying spatially variable features from high-resolution spatial omics data.","authors":"Jinpu Li, Mauminah Raina, Yiqing Wang, Chunhui Xu, Li Su, Qi Guo, Ricardo Melo Ferreira, Michael T Eadon, Qin Ma, Juexin Wang, Dong Xu","doi":"10.1093/bioinformatics/btaf554","DOIUrl":"10.1093/bioinformatics/btaf554","url":null,"abstract":"<p><strong>Motivation: </strong>Emerging spatial omics technologies empower comprehensive exploration of biological systems from multi-omics perspectives in their native tissue location in two and three-dimensional space. However, the limited sequencing depth, increasing spatial resolution, and growing spatial spots in spatial omics technologies present significant computational challenges in identifying biologically meaningful molecules with variable spatial distributions across various omics modalities.</p><p><strong>Results: </strong>We introduce scBSP, an open-source, versatile, and user-friendly package for identifying spatially variable features in large-scale spatial omics data. scBSP demonstrates significantly enhanced computational efficiency, processing high-resolution spatial omics data within seconds, and exhibits robust cross-platform performance by consistently identifying spatially variable features with high reproducibility across various sequencing platforms.</p><p><strong>Availability: </strong>scBSP is available for download from R CRAN at https://cran.r-project.org/web/packages/scBSP/index.html and PyPI at https://pypi.org/project/scbsp/.</p><p><strong>Supplementary information: </strong>The supplementary data and code are openly available from Zenodo at https://zenodo.org/records/14768450.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}