Flashzoi: an enhanced Borzoi for accelerated genomic analysis.

IF 5.4
Johannes C Hingerl, Alexander Karollus, Julien Gagneur
{"title":"Flashzoi: an enhanced Borzoi for accelerated genomic analysis.","authors":"Johannes C Hingerl, Alexander Karollus, Julien Gagneur","doi":"10.1093/bioinformatics/btaf467","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, the currently used relative positional encodings limit Borzoi's computational efficiency.</p><p><strong>Results: </strong>We present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi's improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.</p><p><strong>Availability and implementation: </strong>The Flashzoi model architecture is part of the MIT-licensed borzoi-pytorch package, can be found at https://github.com/johahi/borzoi-pytorch and installed via pip. Model weights for all four Flashzoi and Borzoi replicates are available at https://huggingface.co/johahi under the MIT license. The code has been archived at https://zenodo.org/records/15669913.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12457734/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, the currently used relative positional encodings limit Borzoi's computational efficiency.

Results: We present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi's improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.

Availability and implementation: The Flashzoi model architecture is part of the MIT-licensed borzoi-pytorch package, can be found at https://github.com/johahi/borzoi-pytorch and installed via pip. Model weights for all four Flashzoi and Borzoi replicates are available at https://huggingface.co/johahi under the MIT license. The code has been archived at https://zenodo.org/records/15669913.

Flashzoi:用于加速基因组分析的增强猎狼。
动机:准确预测DNA序列如何驱动基因调控以及基因变异如何改变基因表达是基因组学的核心挑战。Borzoi已经建立了超过一万种基因组分析的模型,其中包括来自超过半兆碱基序列的RNA-seq覆盖范围,有望成为调控基因组学的重要基础模型,无论是大规模注释变异还是进一步的模型开发。然而,目前使用的相对位置编码限制了Borzoi的计算效率。结果:我们提出Flashzoi,一个增强的Borzoi模型,利用旋转位置编码和FlashAttention-2。这实现了超过3倍的训练和推理速度以及高达2.4倍的内存使用减少,同时保持或提高了各种基因组分析建模的准确性,包括RNA-seq覆盖,预测变异效应和增强子-启动子连接。Flashzoi提高的效率促进了大规模基因组分析,并为探索更复杂的调控机制和建模开辟了道路。可用性:Flashzoi模型架构是麻省理工学院许可的borzoi-pytorch包的一部分,可以在https://github.com/johahi/borzoi-pytorch上找到并通过pip安装。所有四个Flashzoi和Borzoi复制的模型权重在MIT许可下可在https://huggingface.co/johahi获得。该代码已存档于https://zenodo.org/records/15669913.Supplementary information;补充数据可在Bioinformatics在线获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信