全方位基因组将 RNA 序列与基因组基础模型中的二级结构对齐

arXiv - QuanBio - Genomics Pub Date : 2024-07-15 DOI:arxiv-2407.11242

Heng Yang, Ke Li

{"title":"全方位基因组将 RNA 序列与基因组基础模型中的二级结构对齐","authors":"Heng Yang, Ke Li","doi":"arxiv-2407.11242","DOIUrl":null,"url":null,"abstract":"The structures of RNA sequences play a vital role in various cellular\nprocesses, while existing genomic foundation models (FMs) have struggled with\nprecise sequence-structure alignment, due to the complexity of exponential\ncombinations of nucleotide bases. In this study, we introduce OmniGenome, a\nfoundation model that addresses this critical challenge of sequence-structure\nalignment in RNA FMs. OmniGenome bridges the sequences with secondary\nstructures using structure-contextualized modeling, enabling hard in-silico\ngenomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The\nresults on two comprehensive genomic benchmarks show that OmniGenome achieves\nstate-of-the-art performance on complex RNA subtasks. For example, OmniGenome\nsolved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of\nthe puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,\nwhile the existing methods usually allocate $24$ hours for each puzzle.\nOverall, OmniGenome establishes wide genomic application cases and offers\nprofound insights into biological mechanisms from the perspective of\nsequence-structure alignment.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models\",\"authors\":\"Heng Yang, Ke Li\",\"doi\":\"arxiv-2407.11242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The structures of RNA sequences play a vital role in various cellular\\nprocesses, while existing genomic foundation models (FMs) have struggled with\\nprecise sequence-structure alignment, due to the complexity of exponential\\ncombinations of nucleotide bases. In this study, we introduce OmniGenome, a\\nfoundation model that addresses this critical challenge of sequence-structure\\nalignment in RNA FMs. OmniGenome bridges the sequences with secondary\\nstructures using structure-contextualized modeling, enabling hard in-silico\\ngenomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The\\nresults on two comprehensive genomic benchmarks show that OmniGenome achieves\\nstate-of-the-art performance on complex RNA subtasks. For example, OmniGenome\\nsolved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of\\nthe puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,\\nwhile the existing methods usually allocate $24$ hours for each puzzle.\\nOverall, OmniGenome establishes wide genomic application cases and offers\\nprofound insights into biological mechanisms from the perspective of\\nsequence-structure alignment.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.11242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.11242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

RNA 序列的结构在各种细胞过程中起着至关重要的作用，而现有的基因组基础模型（FMs）由于核苷酸碱基指数组合的复杂性，一直难以实现精确的序列-结构比对。在本研究中，我们介绍了 OmniGenome，它是一种基础模型，可以解决 RNA FMs 序列-结构比对的这一关键难题。OmniGenome 利用结构上下文化建模将序列与二级结构连接起来，从而实现现有 FM 无法处理的硅基因组内艰巨任务，例如 RNA 设计任务。两个综合基因组基准测试的结果表明，OmniGenome 在复杂的 RNA 子任务上达到了最先进的性能。例如，OmniGenomes 解决了 74% 的复杂难题，而 SpliceBERT 只解决了 3% 的难题。总之，OmniGenome 建立了广泛的基因组应用案例，并从序列结构比对的角度提供了对生物机制的新见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models

The structures of RNA sequences play a vital role in various cellular processes, while existing genomic foundation models (FMs) have struggled with precise sequence-structure alignment, due to the complexity of exponential combinations of nucleotide bases. In this study, we introduce OmniGenome, a foundation model that addresses this critical challenge of sequence-structure alignment in RNA FMs. OmniGenome bridges the sequences with secondary structures using structure-contextualized modeling, enabling hard in-silico genomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The results on two comprehensive genomic benchmarks show that OmniGenome achieves state-of-the-art performance on complex RNA subtasks. For example, OmniGenome solved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of the puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour, while the existing methods usually allocate $24$ hours for each puzzle. Overall, OmniGenome establishes wide genomic application cases and offers profound insights into biological mechanisms from the perspective of sequence-structure alignment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量