Bioinformatics (Oxford, England)最新文献_第9页

HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes. HallmarkGraph：一个用于分类分层肿瘤亚型的癌症标志信息图神经网络。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf444

Qingsong Zhang, Fei Liu, Xin Lai

{"title":"HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes.","authors":"Qingsong Zhang, Fei Liu, Xin Lai","doi":"10.1093/bioinformatics/btaf444","DOIUrl":"10.1093/bioinformatics/btaf444","url":null,"abstract":"Motivation: Accurate tumor subtype diagnosis is crucial for precision oncology, yet current methodologies face significant challenges. These include balancing model accuracy with interpretability and the high costs of generating multi-omics data in clinical settings. Moreover, there is a lack of validated models capable of classifying hierarchical tumor subtypes across a comprehensive pan-cancer cohort.Results: We present a graph neural network, HallmarkGraph, the first biologically informed model developed to classify hierarchical tumor subtypes in human cancer. Inspired by cancer hallmarks, the model's architecture integrates transcriptome profiles and gene regulatory interactions to perform multi-label classification. We evaluate the model on a comprehensive pan-cancer cohort comprising 11 476 samples from 26 primary cancers with 405 subtypes up to eight levels. The model demonstrates exceptional performance, achieving 5-fold cross-validation accuracy between 85% and 99% for tumor subtypes labeled with increasing details of genomic information. It also shows good generalizability on a validation dataset of 887 samples, assessed using three metrics that consider tumor subtypes at individual, combined, and sample levels. Benchmarking and ablation experiments show that hallmark-based embeddings slightly influence model performance, while the integrated multilayer perceptron plays a significant role in determining classifier accuracy. Additionally, we use the SHAP method to link cancer hallmarks with genes, identifying key features that influence model decisions. Our findings present a biologically informed machine learning framework capable of tracking tumor transcriptomic trajectories and distinguishing inter- and intra-tumor heterogeneity in pan-cancer. This approach holds promise for enhancing cancer diagnostics.Availability and implementation: HallmarkGraph is accessible at https://github.com/laixn/HallmarkGraph.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

scPOEM: robust co-embedding of peaks and genes revealing peak-gene regulation. 题目：峰与基因的鲁棒共嵌入揭示峰基因调控。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf483

Yan Zhong, Yuntong Hou, Yongjian Yang, Xinyue Zheng, James J Cai

{"title":"scPOEM: robust co-embedding of peaks and genes revealing peak-gene regulation.","authors":"Yan Zhong, Yuntong Hou, Yongjian Yang, Xinyue Zheng, James J Cai","doi":"10.1093/bioinformatics/btaf483","DOIUrl":"10.1093/bioinformatics/btaf483","url":null,"abstract":"Motivation: Identifying regulatory elements in various chromosomal regions that influence gene expression is a fundamental challenge in epigenomics, with profound implications for understanding gene regulation and disease mechanisms. The advent of paired single-cell RNA sequencing and single-cell ATAC sequencing has created unprecedented opportunities to address this challenge by enabling simultaneous profiling of gene expression and chromatin accessibility at single-cell resolution. However, the inherent signals between them are weak due to the highly sparse and noisy nature of data.Results: This article proposes single-cell meta-Path based Omics Embedding (scPOEM), a novel embedding method that jointly projects chromatin accessibility peaks and expressed genes into a shared low-dimensional space. By integrating the relationships among peak-peak, peak-gene, and gene-gene interactions, scPOEM assigns closer representations in the embedding space to related peak-gene pairs. Our experiments demonstrate that scPOEM generates stable representations of peaks and genes, outperforms existing methods in recovering biologically meaningful peak-gene regulatory relationships and enables new insights in subgroup and differential analysis of gene regulation. These results highlight its potential to uncover gene regulatory mechanisms and enhance the understanding of transcriptional regulation at single-cell resolution.Availability and implementation: The source code of scPOEM is available at https://github.com/Houyt23/scPOEM. The datasets can be obtained from the 10× Genomics (https://www.10xgenomics.com/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-1-0-0) and GEO database under access codes GSE194122 and GSE239916.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SPACE: STRING proteins as complementary embeddings. 空间：作为互补嵌入的字符串蛋白质。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf496

Dewei Hu, Damian Szklarczyk, Christian von Mering, Lars Juhl Jensen

{"title":"SPACE: STRING proteins as complementary embeddings.","authors":"Dewei Hu, Damian Szklarczyk, Christian von Mering, Lars Juhl Jensen","doi":"10.1093/bioinformatics/btaf496","DOIUrl":"10.1093/bioinformatics/btaf496","url":null,"abstract":"Motivation: Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.Results: We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.Availability and implementation: The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453690/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NetMedPy: a Python package for large-scale network medicine screening. NetMedPy：用于大规模网络医学筛选的Python包。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf338

Andres Aldana, Michael Sebek, Gordana Ispirova, Rodrigo Dorantes-Gilardi, Joseph Loscalzo, Albert-László Barabási, Giulia Menichetti

{"title":"NetMedPy: a Python package for large-scale network medicine screening.","authors":"Andres Aldana, Michael Sebek, Gordana Ispirova, Rodrigo Dorantes-Gilardi, Joseph Loscalzo, Albert-László Barabási, Giulia Menichetti","doi":"10.1093/bioinformatics/btaf338","DOIUrl":"10.1093/bioinformatics/btaf338","url":null,"abstract":"Summary: Network medicine leverages the quantification of information flow within sub-cellular networks to elucidate disease etiology and comorbidity, as well as to predict drug efficacy and identify potential therapeutic targets. However, current Network Medicine toolsets often lack computationally efficient data processing pipelines that support diverse scoring functions, network distance metrics, and null models. These limitations hamper their application in large-scale molecular screening, hypothesis testing, and ensemble modeling. To address these challenges, we introduce NetMedPy, a highly efficient and versatile computational package designed for comprehensive Network Medicine analyses.Availability and implementation: NetMedPy is an open-source Python package under an MIT license. Source code, documentation, and installation instructions can be downloaded from https://github.com/menicgiulia/NetMedPy and https://pypi.org/project/NetMedPy. The package can run on any standard desktop computer or computing cluster.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144509942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Less is more: improving cell-type identification with augmentation-free single-cell RNA-Seq contrastive learning. 少即是多：用无增强的单细胞RNA-Seq对比学习改进细胞类型鉴定。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf437

Ibrahim Alsaggaf, Daniel Buchan, Cen Wan

{"title":"Less is more: improving cell-type identification with augmentation-free single-cell RNA-Seq contrastive learning.","authors":"Ibrahim Alsaggaf, Daniel Buchan, Cen Wan","doi":"10.1093/bioinformatics/btaf437","DOIUrl":"10.1093/bioinformatics/btaf437","url":null,"abstract":"Motivation: Cell-type identification is one of the most important tasks in single-cell RNA Sequencing (scRNA-Seq) analysis. Recent research has revealed contrastive learning's great potential in handling multiple cell-type identification tasks.Results: In this work, we proposed a novel augmentation-free scRNA-Seq contrastive learning (AF-RCL) algorithm, which simplifies the conventional data augmentation operation and adopts a new contrastive learning loss function. A large-scale empirical evaluation suggests that AF-RCL not only outperformed other contrastive learning-based cell-type identification methods but also obtained state-of-the-art predictive performance compared with other well-known cell-type identification methods. Further analysis also shows AF-RCL's advantages in learning high-quality discriminative feature representations based on scRNA-Seq expression profiles.Availability and implementation: The source code is available at https://doi.org/10.6084/m9.figshare.28830311.v1 and at https://github.com/ibrahimsaggaf/AFRCL. The pre-trained AF-RCL encoders can be downloaded from https://doi.org/10.5281/zenodo.15109736, and the scRNA-Seq datasets used in this work can be downloaded from https://doi.org/10.5281/zenodo.8087611.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144839359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MDGraphEmb: a toolkit for graph embedding and classification of protein conformational ensembles. MDGraphEmb：一个用于蛋白质构象集成图嵌入和分类的工具包。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf420

Ferdoos Hossein Nezhad, Namir Oues, Massimiliano Meli, Alessandro Pandini

{"title":"MDGraphEmb: a toolkit for graph embedding and classification of protein conformational ensembles.","authors":"Ferdoos Hossein Nezhad, Namir Oues, Massimiliano Meli, Alessandro Pandini","doi":"10.1093/bioinformatics/btaf420","DOIUrl":"10.1093/bioinformatics/btaf420","url":null,"abstract":"Motivation: Molecular Dynamics (MD) simulations are essential for investigating protein dynamics and function. Although significant advances have been made in integrating simulation techniques and machine learning, there are still challenges in selecting the most suitable data representation for learning. Graph embedding is a powerful computational method that automatically learns low-dimensional representations of nodes in a graph while preserving graph topology and node properties, thereby bridging graph structures and machine learning methods. Graph embeddings hold great potential for efficiently representing MD simulation data and studying protein dynamics.Results: We present MDGraphEmb, a Python library built on MDAnalysis, specifically designed to convert protein MD simulation trajectories into graph-based representations and corresponding graph embeddings. This transformation enables the compression of high-dimensional, noisy trajectories from protein simulations into tabular formats suitable for machine learning. MDGraphEmb provides a framework that supports a range of graph embedding techniques and machine learning models, enabling the creation of workflows to analyse protein dynamics and identify important protein conformations. Graph embedding effectively captures and compresses structural information from protein MD simulation data, making it applicable to diverse downstream machine-learning classification tasks. We present an application for encoding and detecting important protein conformations from molecular dynamics simulations to classify functional states, using adenylate kinase (ADK) as the main case study. To assess the generalizability of the approach, two additional systems, Plantaricin E (PlnE) and HIV-1 protease are included as supplementary validation examples. A performance comparison of different graph embedding methods combined with machine learning models is also provided.Availability and implementation: MDGraphEMB GitHub Repository: https://github.com/FerdoosHN/MDGraphEMB.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453676/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144762669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

deepNGS navigator: exploring antibody NGS datasets using deep contrastive learning. deepNGS Navigator：使用深度对比学习探索抗体NGS数据集。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf414

Homa MohammadiPeyhani, Edith Lee, Richard Bonneau, Vladimir Gligorijevic, Jae Hyeon Lee

{"title":"deepNGS navigator: exploring antibody NGS datasets using deep contrastive learning.","authors":"Homa MohammadiPeyhani, Edith Lee, Richard Bonneau, Vladimir Gligorijevic, Jae Hyeon Lee","doi":"10.1093/bioinformatics/btaf414","DOIUrl":"10.1093/bioinformatics/btaf414","url":null,"abstract":"Motivation: High-throughput sequencing uncovers how B-cells adapt in response to antigens by generating B-cell-receptor (BCR) sequences at an unprecedented scale. As BCR datasets grow to millions of sequences, using efficient computational methods becomes crucial. One important aspect of antibody sequence analysis is detecting clonal families or clusters of related sequences, whether they come from immunization, synthetic-libraries or even ML-generated datasets.Results: We introduce deepNGS Navigator, a computational tool that leverages language models and contrastive learning to transform antibody sequences into intuitive 2D representations. The resulting 2D maps offer a visualization of overall diversity of input datasets, which can be clustered based on the sequence distances and their densities across the map. Beyond grouping related sequences, the 2D maps also represent mutational patterns inferred from sequence embeddings, enabling trajectory analysis and clustering within the projected space. By overlaying properties such as charge, the map helps identify clusters of interest for further investigation while also flagging potentially noisy or non-specific sequences with higher risk. We demonstrate deepNGS Navigator's utilities on several datasets, including: (i) a synthetic-library from a yeast-display targeting HER2, (ii) a machine learning-generated dataset with a hierarchical structure, (iii) NGS sequences from a llama immunized against COVID RBD, (iv) human naive and memory B-cell sequences, and (v) an in silico dataset simulating B-cell clonal lineages.Availability and implementation: The deepNGS Navigator source code is available at: github.com/prescient-design/deepngs-navigator and github.com/prescient-design/deepngs-navigator-panel-app.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448221/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144839354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KGCLMDA: a computational model for predicting latent associations of microbial drugs using knowledge graphs and contrastive learning. KGCLMDA：利用知识图和对比学习预测微生物药物潜在关联的计算模型。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf457

Meiling Liu, Shujuan Su, Guohua Wang, Shan Huang

{"title":"KGCLMDA: a computational model for predicting latent associations of microbial drugs using knowledge graphs and contrastive learning.","authors":"Meiling Liu, Shujuan Su, Guohua Wang, Shan Huang","doi":"10.1093/bioinformatics/btaf457","DOIUrl":"10.1093/bioinformatics/btaf457","url":null,"abstract":"Motivation: Predicting microbe-drug associations (MDgAs) is critical for understanding the role of microbes in drug metabolism, exploring their interactions with host physiology, and advancing personalized therapy. However, traditional methods face challenges in dealing with data sparsity, information imbalance, and the extraction of complex biological knowledge, which limit the accurate prediction of microbe-drug associations. Therefore, developing a computational model that can efficiently integrate multi-source data and address the challenges of data sparsity and information imbalance is essential.Results: The paper proposes a model that integrates knowledge graphs and contrastive learning. By constructing both local and non-local association graphs, the model effectively captures the complex relationships between microbes and drugs. We preprocess and model the embedding representations of microbes and drugs, and design a multi-level interactive contrastive learning mechanism to optimize the information flow both within and outside the graph. Experimental results show that our model significantly outperforms existing methods in metrics such as AUC and AUPR, providing an efficient solution for predicting microbe-drug associations.Availability and implementation: The source code is available at: https://github.com/SJshujuan/KGCLMDA. The code used in this study is also available on Zenodo: https://doi.org/10.5281/zenodo.16754402.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DiCARN-DNase: enhancing cell-to-cell Hi-C resolution using dilated cascading ResNet with self-attention and DNase-seq chromatin accessibility data. DiCARN-DNase：利用扩展级联ResNet与自注意和dna序列染色质可及性数据增强细胞间的Hi-C分辨率。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf452

Samuel Olowofila, Oluwatosin Oluwadare

{"title":"DiCARN-DNase: enhancing cell-to-cell Hi-C resolution using dilated cascading ResNet with self-attention and DNase-seq chromatin accessibility data.","authors":"Samuel Olowofila, Oluwatosin Oluwadare","doi":"10.1093/bioinformatics/btaf452","DOIUrl":"10.1093/bioinformatics/btaf452","url":null,"abstract":"Motivation: The spatial organization of chromatin is fundamental to gene regulation and essential for proper cellular function. The Hi-C technique remains the leading method for unraveling 3D genome structures, but the limited availability of high-resolution (HR) Hi-C data poses significant challenges for comprehensive analysis. Deep learning models have been developed to predict HR Hi-C data from low-resolution counterparts. Early Convolutional Neural Network (CNN)-based models improved resolution but struggled with issues like blurring and capturing fine details. In contrast, Generative Adversarial Network (GAN)-based methods encountered difficulties in maintaining diversity and generalization. Additionally, most existing algorithms perform poorly in cross-cell line generalization, where a model trained on one cell type is used to enhance HR data in another cell type.Results: In this work, we propose Dilated Cascading Residual Network (DiCARN) to overcome these challenges and improve Hi-C data resolution. DiCARN leverages dilated convolutions and cascading residuals to capture a broader context while preserving fine-grained genomic interactions. Additionally, we incorporate DNase-seq data into our model, providing a robust framework that demonstrates superior generalizability across cell lines in HR Hi-C data reconstruction.Availability and implementation: DiCARN is publicly available at https://github.com/OluwadareLab/DiCARN.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flashzoi: an enhanced Borzoi for accelerated genomic analysis. Flashzoi：用于加速基因组分析的增强猎狼。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI: 10.1093/bioinformatics/btaf467

Johannes C Hingerl, Alexander Karollus, Julien Gagneur

{"title":"Flashzoi: an enhanced Borzoi for accelerated genomic analysis.","authors":"Johannes C Hingerl, Alexander Karollus, Julien Gagneur","doi":"10.1093/bioinformatics/btaf467","DOIUrl":"10.1093/bioinformatics/btaf467","url":null,"abstract":"Motivation: Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, the currently used relative positional encodings limit Borzoi's computational efficiency.Results: We present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi's improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.Availability and implementation: The Flashzoi model architecture is part of the MIT-licensed borzoi-pytorch package, can be found at https://github.com/johahi/borzoi-pytorch and installed via pip. Model weights for all four Flashzoi and Borzoi replicates are available at https://huggingface.co/johahi under the MIT license. The code has been archived at https://zenodo.org/records/15669913.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12457734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0