全方位基因组将 RNA 序列与基因组基础模型中的二级结构对齐

Heng Yang, Ke Li
{"title":"全方位基因组将 RNA 序列与基因组基础模型中的二级结构对齐","authors":"Heng Yang, Ke Li","doi":"arxiv-2407.11242","DOIUrl":null,"url":null,"abstract":"The structures of RNA sequences play a vital role in various cellular\nprocesses, while existing genomic foundation models (FMs) have struggled with\nprecise sequence-structure alignment, due to the complexity of exponential\ncombinations of nucleotide bases. In this study, we introduce OmniGenome, a\nfoundation model that addresses this critical challenge of sequence-structure\nalignment in RNA FMs. OmniGenome bridges the sequences with secondary\nstructures using structure-contextualized modeling, enabling hard in-silico\ngenomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The\nresults on two comprehensive genomic benchmarks show that OmniGenome achieves\nstate-of-the-art performance on complex RNA subtasks. For example, OmniGenome\nsolved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of\nthe puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,\nwhile the existing methods usually allocate $24$ hours for each puzzle.\nOverall, OmniGenome establishes wide genomic application cases and offers\nprofound insights into biological mechanisms from the perspective of\nsequence-structure alignment.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models\",\"authors\":\"Heng Yang, Ke Li\",\"doi\":\"arxiv-2407.11242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The structures of RNA sequences play a vital role in various cellular\\nprocesses, while existing genomic foundation models (FMs) have struggled with\\nprecise sequence-structure alignment, due to the complexity of exponential\\ncombinations of nucleotide bases. In this study, we introduce OmniGenome, a\\nfoundation model that addresses this critical challenge of sequence-structure\\nalignment in RNA FMs. OmniGenome bridges the sequences with secondary\\nstructures using structure-contextualized modeling, enabling hard in-silico\\ngenomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The\\nresults on two comprehensive genomic benchmarks show that OmniGenome achieves\\nstate-of-the-art performance on complex RNA subtasks. For example, OmniGenome\\nsolved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of\\nthe puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,\\nwhile the existing methods usually allocate $24$ hours for each puzzle.\\nOverall, OmniGenome establishes wide genomic application cases and offers\\nprofound insights into biological mechanisms from the perspective of\\nsequence-structure alignment.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.11242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.11242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

RNA 序列的结构在各种细胞过程中起着至关重要的作用,而现有的基因组基础模型(FMs)由于核苷酸碱基指数组合的复杂性,一直难以实现精确的序列-结构比对。在本研究中,我们介绍了 OmniGenome,它是一种基础模型,可以解决 RNA FMs 序列-结构比对的这一关键难题。OmniGenome 利用结构上下文化建模将序列与二级结构连接起来,从而实现现有 FM 无法处理的硅基因组内艰巨任务,例如 RNA 设计任务。两个综合基因组基准测试的结果表明,OmniGenome 在复杂的 RNA 子任务上达到了最先进的性能。例如,OmniGenomes 解决了 74% 的复杂难题,而 SpliceBERT 只解决了 3% 的难题。总之,OmniGenome 建立了广泛的基因组应用案例,并从序列结构比对的角度提供了对生物机制的新见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models
The structures of RNA sequences play a vital role in various cellular processes, while existing genomic foundation models (FMs) have struggled with precise sequence-structure alignment, due to the complexity of exponential combinations of nucleotide bases. In this study, we introduce OmniGenome, a foundation model that addresses this critical challenge of sequence-structure alignment in RNA FMs. OmniGenome bridges the sequences with secondary structures using structure-contextualized modeling, enabling hard in-silico genomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The results on two comprehensive genomic benchmarks show that OmniGenome achieves state-of-the-art performance on complex RNA subtasks. For example, OmniGenome solved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of the puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour, while the existing methods usually allocate $24$ hours for each puzzle. Overall, OmniGenome establishes wide genomic application cases and offers profound insights into biological mechanisms from the perspective of sequence-structure alignment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信