Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
BTS: a scalable Bayesian Tissue Score for prioritizing GWAS variants and their functional contexts across >1,000s of omics datasets. BTS:一个可扩展的贝叶斯组织评分,用于在1000多个组学数据集中对GWAS变体及其功能背景进行优先排序。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-26 DOI: 10.1093/bioinformatics/btaf509
Pavel P Kuksa, Matei Ionita, Luke Carter, Jeffrey Cifello, Prabhakaran Gangadharan, Kaylyn Clark, Otto Valladares, Yuk Yee Leung, Li-San Wang
{"title":"BTS: a scalable Bayesian Tissue Score for prioritizing GWAS variants and their functional contexts across >1,000s of omics datasets.","authors":"Pavel P Kuksa, Matei Ionita, Luke Carter, Jeffrey Cifello, Prabhakaran Gangadharan, Kaylyn Clark, Otto Valladares, Yuk Yee Leung, Li-San Wang","doi":"10.1093/bioinformatics/btaf509","DOIUrl":"10.1093/bioinformatics/btaf509","url":null,"abstract":"<p><strong>Motivation: </strong>statistics from genome-wide association studies (GWAS) are widely used in fine-mapping and colocalization analyses to identify causal variants and their enrichment in functional contexts, such as affected cell types and genomic features. With the expansion of functional genomic (FG) datasets, which now include hundreds of thousands of tracks across various cell and tissue types, it is critical to establish scalable algorithms integrating thousands of diverse FG annotations with GWAS results.</p><p><strong>Results: </strong>We propose BTS (Bayesian Tissue Score), a novel, highly efficient algorithm uniquely designed for 1) identifying affected cell types and functional elements (context-mapping) and 2) fine-mapping potentially causal variants in a context-specific manner using large collections of cell type-specific FG annotation tracks. BTS leverages GWAS summary statistics and annotation-specific Bayesian models to analyze genome-wide annotation tracks, including enhancers, open chromatin, and histone marks. We evaluated BTS on GWAS summary statistics for immune and cardiovascular traits, such as Inflammatory Bowel Disease (IBD), Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE), and Coronary Artery Disease (CAD). Our results demonstrate that BTS is over 100x more efficient in estimating functional annotation effects and context-specific variant fine-mapping compared to existing methods. Importantly, this large-scale Bayesian approach prioritizes both known and novel annotations, cell types, genomic regions, and variants and provides valuable biological insights into the functional contexts of these diseases.</p><p><strong>Availability: </strong>Docker image is available at https://hub.docker.com/r/wanglab/bts with pre-installed BTS R package (https://bitbucket.org/wanglab-upenn/BTS-R) and BTS GWAS summary statistics analysis pipeline (https://bitbucket.org/wanglab-upenn/bts-pipeline).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing genome recovery across metagenomic samples using MAGmax. 利用MAGmax增强宏基因组样本的基因组恢复。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-26 DOI: 10.1093/bioinformatics/btaf538
Arangasamy Yazhini, Johannes Söding
{"title":"Enhancing genome recovery across metagenomic samples using MAGmax.","authors":"Arangasamy Yazhini, Johannes Söding","doi":"10.1093/bioinformatics/btaf538","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf538","url":null,"abstract":"<p><strong>Summary: </strong>The number of metagenome-assembled genomes (MAGs) is rapidly increasing with the growing scale of metagenomic studies, driving fast progress in microbiome research. Sample-wise assembly has become the standard due to its computational efficiency and strain-level resolution. It requires dereplication, the removal of near-identical genomes assembled in different metagenomic samples. We present MAGmax, an efficient dereplication tool that enhances both the quantity and quality of MAGs through a strategy of bin merging and reassembly. Unlike dRep, which selects a single representative bin per genome cluster, MAGmax merges multiple bins within a cluster and reassembles them to increase coverage. MAGmax produces more dereplicated, higher-quality MAGs than dRep at 1.6× its speed and using three times less memory.</p><p><strong>Availability and implementation: </strong>The MAGmax open source software, implemented in Rust, is available under the GPLv3 license at https://github.com/soedinglab/MAGmax.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PTL-PRS: an R package for transfer learning of polygenic risk scores with pseudovalidation. PTL-PRS:一个具有伪验证的多基因风险评分迁移学习的R包。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-24 DOI: 10.1093/bioinformatics/btaf540
Bokeum Cho, Seunggeun Lee
{"title":"PTL-PRS: an R package for transfer learning of polygenic risk scores with pseudovalidation.","authors":"Bokeum Cho, Seunggeun Lee","doi":"10.1093/bioinformatics/btaf540","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf540","url":null,"abstract":"<p><strong>Summary: </strong>Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.</p><p><strong>Availability and implementation: </strong>The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing the regulatory logic of transcriptional control at the DNA sequence level by ensembles of thermodynamic models. 用热力学模型集合描述DNA序列水平上转录控制的调控逻辑。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-24 DOI: 10.1093/bioinformatics/btaf534
Alan Utsuni Sabino, Drielly de Moraes Guerreiro, Ah-Ram Kim, Alexandre Ferreira Ramos, John Reinitz
{"title":"Characterizing the regulatory logic of transcriptional control at the DNA sequence level by ensembles of thermodynamic models.","authors":"Alan Utsuni Sabino, Drielly de Moraes Guerreiro, Ah-Ram Kim, Alexandre Ferreira Ramos, John Reinitz","doi":"10.1093/bioinformatics/btaf534","DOIUrl":"10.1093/bioinformatics/btaf534","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding how the genome encodes the regulatory logic of transcription is a main challenge of the post-genomic era, and can be overcome with the aid of customized computational tools.</p><p><strong>Results: </strong>We report an automated framework for analyzing an ensemble of fits to data of a thermodynamics-based sequence-level model for transcriptional regulation. The fits are clustered accordingly with their intrinsic regulatory logic. A multiscale analysis enables visualization of quantitative features resulting from the deconvolution of the regulatory profile provided by multiple transcription factors interacting with the locus of a gene. Quantitative experimental data on reporters driven by the whole locus of the even-skipped gene in the blastoderm of Drosophila embryos was used for validating our approach. A few clusters of highly active DNA binding sites within the enhancers collectively modulate even-skipped gene transcription. Analysis of variable enhancers' length shows the importance of bound protein-protein interactions for transcriptional regulation. The interplay between activation and quenching enables function conservation of enhancers despite length variations.</p><p><strong>Availability and implementation: </strong>the transcription factor level data used for performing the reported study is accessible in the input files in Zenodo and GitHub as well the full code. Additional data from formerly FlyEx database will be available under request.</p><p><strong>Supplementary information: </strong>Supplementary data is available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
aMLProt: an automated machine learning library for protein applications. aMLProt:用于蛋白质应用的自动机器学习库。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-24 DOI: 10.1093/bioinformatics/btaf543
Ruite Xiang, Christian Domínguez-Dalmases, Albert Cañellas-Solé, Victor Guallar
{"title":"aMLProt: an automated machine learning library for protein applications.","authors":"Ruite Xiang, Christian Domínguez-Dalmases, Albert Cañellas-Solé, Victor Guallar","doi":"10.1093/bioinformatics/btaf543","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf543","url":null,"abstract":"<p><strong>Motivation: </strong>Machine learning tools have become increasingly common in biological research, driven by the emergence of pre-trained large language models. However, training effective models remains a complex task, since many choices influence their performance. AutoML (automated machine learning) approaches help address these challenges by streamlining the entire model development pipeline.</p><p><strong>Results: </strong>We developed aMLProt, an AutoML framework tailored specifically for protein applications, such as enzyme engineering and bioprospecting. It features a modular design, allowing each component to be used independently or in combination. Notably, aMLProt integrates 19 classifiers and 26 regressors, along with pre-trained protein language models. It also includes standalone applications proven useful for protein-related workflows. To enhance usability, aMLProt is integrated with Horus, a GUI-based application with a visual interface.</p><p><strong>Availability: </strong>aMLProt is available on https://github.com/etiur/aMLProt.git and https://doi.org/10.5281/zenodo.14971157; The aMLProt plugin is available via the official Horus Plugin Repository https://horus.bsc.es/repo/plugins/amlprot, and Horus itself can be freely downloaded from https://horus.bsc.es. Moreover, a demo of aMLProt can be found, without previous registration or download, at the horus.bsc.es/amlprot and horus.bsc.es/amlprot-suggest. The results and data from the pH optima regression model are available at: https://zenodo.org/records/15394097.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPAED: Harnessing AlphaFold Output for Accurate Segmentation of Phage Endolysin Domains. 利用AlphaFold输出精确分割噬菌体内溶素结构域。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-24 DOI: 10.1093/bioinformatics/btaf531
Alexandre Boulay, Emma Cremelie, Clovis Galiez, Yves Briers, Elsa Rousseau, Roberto Vázquez
{"title":"SPAED: Harnessing AlphaFold Output for Accurate Segmentation of Phage Endolysin Domains.","authors":"Alexandre Boulay, Emma Cremelie, Clovis Galiez, Yves Briers, Elsa Rousseau, Roberto Vázquez","doi":"10.1093/bioinformatics/btaf531","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf531","url":null,"abstract":"<p><strong>Summary: </strong>SPAED is an accessible tool for the accurate segmentation of protein domains that leverages information contained in the predicted aligned error (PAE) matrix obtained from AlphaFold to better identify domain-linker boundaries and detect terminal disordered regions. On a dataset of 376 bacteriophage endolysins (proteins that degrade the bacterial cell wall), SPAED achieves a mean intersect-over-union score of 96% and a domain-boundary-distance score of 89% compared to 94% and 70%, respectively, for the state-of-the-art tool Chainsaw.</p><p><strong>Availability and implementation: </strong>Implemented in Python, SPAED is accessible on the web (https://spaed.ca) and available for download from https://github.com/Rousseau-Team/spaed or https://pypi.org/project/spaed. The data used to test SPAED can be found at https://doi.org/10.5281/zenodo.15285860.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accessible, uniform protein property prediction with a scikit-learn based toolset AIDE. 使用基于scikit-learn的工具集AIDE进行可访问的、统一的蛋白质性质预测。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-24 DOI: 10.1093/bioinformatics/btaf544
Evan Komp, Kristoffer E Johansson, Nicholas P Gauthier, Japheth E Gado, Kresten Lindorff-Larsen, Gregg T Beckham
{"title":"Accessible, uniform protein property prediction with a scikit-learn based toolset AIDE.","authors":"Evan Komp, Kristoffer E Johansson, Nicholas P Gauthier, Japheth E Gado, Kresten Lindorff-Larsen, Gregg T Beckham","doi":"10.1093/bioinformatics/btaf544","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf544","url":null,"abstract":"<p><strong>Summary: </strong>Protein property prediction via machine learning with and without labeled data is becoming increasingly powerful, yet methods are disparate and capabilities vary widely over applications. The software presented here, \"Artificial Intelligence Driven protein Estimation (AIDE),\" enables instantiating, optimizing, and testing many zero-shot and supervised property prediction methods for variants and variable length homologs in a single, reproducible notebook or script by defining a modular, standardized application programming interface (API) that is drop-in compatible with scikit-learn transformers and pipelines.</p><p><strong>Availability and implementation: </strong>AIDE is an installable, importable python package inheriting from scikit-learn classes and API and is installable on Windows, Mac, and Linux. Many of the wrapped models internal to AIDE will be effectively inaccessible without a GPU, and some assume CUDA. The newest stable, tested version can be found at https://github.com/beckham-lab/aide_predict and a full user guide and API reference can be found at https://beckham-lab.github.io/aide_predict/. Static versions of both at the time of writing can be found on Zenodo. (Komp and Beckham 2025).</p><p><strong>Supplementary information: </strong>Digital supplementary data contains API examples and a user guide. Appendix A and B provide PDFs of notebooks for showcases. Source data for figures are provided.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HyperPhS: A pharmacophore-guided multimodal representation framework for metabolic stability prediction through contrastive hypergraph learning. hyperph:一个药效团引导的多模态表示框架,通过对比超图学习来预测代谢稳定性。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-22 DOI: 10.1093/bioinformatics/btaf524
Xiaoyi Liu, Na Zhang, Chenglong Kang, Hongpeng Yang, Chengwei Ai, Jijun Tang, Fei Guo
{"title":"HyperPhS: A pharmacophore-guided multimodal representation framework for metabolic stability prediction through contrastive hypergraph learning.","authors":"Xiaoyi Liu, Na Zhang, Chenglong Kang, Hongpeng Yang, Chengwei Ai, Jijun Tang, Fei Guo","doi":"10.1093/bioinformatics/btaf524","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf524","url":null,"abstract":"<p><strong>Motivation: </strong>Metabolic stability is crucial in the early stage of drug discovery and development. Drug candidate screening and optimization can be streamlined through the accurate prediction of stability. Functional groups within drug molecules are known as pharmacophores, which bind directly to receptors or biological macromolecules to produce biological effects, thereby affecting metabolic stability. Therefore, determining metabolic stability via the pharmacophore groups remains a significant challenge.</p><p><strong>Results: </strong>To address these issues, we propose a Pharmacophore-guided Hypergraph representation framework for predicting metabolic Stability (HyperPhS). In this study, we introduce a hypergraph-based method to extract features from metabolic pharmacophores with multi-view representation and contrastive learning. In particular, we introduce a pharmacophore-based contrastive learning encoder that captures the consistency between functional and nonfunctional structures. Our method applies ChatGPT simultaneously to metabolites and heterogeneous encoders and integrates multimodal representations by using attention-driven fusion modules coupled with fully connected neural networks. On the HLM dataset, HyperPhS achieves outstanding performance with 87.6% in AUC and 62.6% in MCC, alongside an external test AUC of 88.3%. In addition, pharmacophore groups studied by HyperPhS are validated for their interpretability through case studies. Overall, HyperPhS is an effective and interpretable tool for determining metabolic stability, identifying critical functional groups, and optimizing compounds.</p><p><strong>Availability and implementation: </strong>The code and data are available at https://github.com/xiaoyiliu-usc/HyperPhS.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TVAE-RNA: Ensemble-Based RNA Secondary Structure Prediction via Transformer Variational Autoencoders. TVAE-RNA:基于集成的RNA二级结构预测,通过变压器变分自编码器。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-22 DOI: 10.1093/bioinformatics/btaf527
Xiyuan Mei, Hanbo Liu, Yuheng Zhu, Enshuang Zhao, Longyi Li, Hao Zhang
{"title":"TVAE-RNA: Ensemble-Based RNA Secondary Structure Prediction via Transformer Variational Autoencoders.","authors":"Xiyuan Mei, Hanbo Liu, Yuheng Zhu, Enshuang Zhao, Longyi Li, Hao Zhang","doi":"10.1093/bioinformatics/btaf527","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf527","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.</p><p><strong>Results: </strong>We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.</p><p><strong>Availability and impiementation: </strong>Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA.The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Learning for the pathogenicity annotation of genetic variants in multi-site clinical settings. 联合学习用于多位点临床环境中遗传变异的致病性注释。
IF 5.4
Bioinformatics (Oxford, England) Pub Date : 2025-09-19 DOI: 10.1093/bioinformatics/btaf523
Nigreisy Montalvo, Francisco Requena, Emidio Capriotti, Antonio Rausell
{"title":"Federated Learning for the pathogenicity annotation of genetic variants in multi-site clinical settings.","authors":"Nigreisy Montalvo, Francisco Requena, Emidio Capriotti, Antonio Rausell","doi":"10.1093/bioinformatics/btaf523","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf523","url":null,"abstract":"<p><strong>Motivation: </strong>Rare diseases collectively affect 5% of the population. However, fewer than 50% of rare disease patients receive a molecular diagnosis after whole genome sequencing. Supervised machine Learning is a valuable approach for the pathogenicity scoring of human genetic variants. However, existing methods are often trained on curated but limited central repositories, resulting in poor accuracy when tested on external cohorts. Yet, large collections of variants generated at hospitals and research institutions remain inaccessible to machine-learning purposes because of privacy and legal constraints. Federated learning (FL) algorithms have been recently developed enabling institutions to collaboratively train models without sharing their local datasets.</p><p><strong>Results: </strong>Here, we present a proof-of-concept study evaluating the effectiveness of federated learn-ing for the clinical classification of genetic variants. A comprehensive array of diverse FL strategies was assessed for coding and non-coding Single Nucleotide Variants as well as Copy Number Variants. Our results showed that federated models generally achieved com-parable or superior performance to traditional centralized learning. In addition, federated models reached a robust generalization to independent sets with smaller data fractions as compared to their centralized model counterparts. Our findings support the adoption of FL to establish secure multi-institutional collaborations in human variant interpretation.</p><p><strong>Availability: </strong>All source code required to reproduce the results presented in this manuscript, implemented in Python, is available under the GNU General Public License v3 at https://github.com/RausellLab/FedLearnVar.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信