{"title":"全方位基因组将 RNA 序列与基因组基础模型中的二级结构对齐","authors":"Heng Yang, Ke Li","doi":"arxiv-2407.11242","DOIUrl":null,"url":null,"abstract":"The structures of RNA sequences play a vital role in various cellular\nprocesses, while existing genomic foundation models (FMs) have struggled with\nprecise sequence-structure alignment, due to the complexity of exponential\ncombinations of nucleotide bases. In this study, we introduce OmniGenome, a\nfoundation model that addresses this critical challenge of sequence-structure\nalignment in RNA FMs. OmniGenome bridges the sequences with secondary\nstructures using structure-contextualized modeling, enabling hard in-silico\ngenomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The\nresults on two comprehensive genomic benchmarks show that OmniGenome achieves\nstate-of-the-art performance on complex RNA subtasks. For example, OmniGenome\nsolved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of\nthe puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,\nwhile the existing methods usually allocate $24$ hours for each puzzle.\nOverall, OmniGenome establishes wide genomic application cases and offers\nprofound insights into biological mechanisms from the perspective of\nsequence-structure alignment.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models\",\"authors\":\"Heng Yang, Ke Li\",\"doi\":\"arxiv-2407.11242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The structures of RNA sequences play a vital role in various cellular\\nprocesses, while existing genomic foundation models (FMs) have struggled with\\nprecise sequence-structure alignment, due to the complexity of exponential\\ncombinations of nucleotide bases. In this study, we introduce OmniGenome, a\\nfoundation model that addresses this critical challenge of sequence-structure\\nalignment in RNA FMs. OmniGenome bridges the sequences with secondary\\nstructures using structure-contextualized modeling, enabling hard in-silico\\ngenomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The\\nresults on two comprehensive genomic benchmarks show that OmniGenome achieves\\nstate-of-the-art performance on complex RNA subtasks. For example, OmniGenome\\nsolved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of\\nthe puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,\\nwhile the existing methods usually allocate $24$ hours for each puzzle.\\nOverall, OmniGenome establishes wide genomic application cases and offers\\nprofound insights into biological mechanisms from the perspective of\\nsequence-structure alignment.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.11242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.11242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OmniGenome: Aligning RNA Sequences with Secondary Structures in Genomic Foundation Models
The structures of RNA sequences play a vital role in various cellular
processes, while existing genomic foundation models (FMs) have struggled with
precise sequence-structure alignment, due to the complexity of exponential
combinations of nucleotide bases. In this study, we introduce OmniGenome, a
foundation model that addresses this critical challenge of sequence-structure
alignment in RNA FMs. OmniGenome bridges the sequences with secondary
structures using structure-contextualized modeling, enabling hard in-silico
genomic tasks that existing FMs cannot handle, e.g., RNA design tasks. The
results on two comprehensive genomic benchmarks show that OmniGenome achieves
state-of-the-art performance on complex RNA subtasks. For example, OmniGenome
solved 74% of complex puzzles, compared to SpliceBERT which solved only 3% of
the puzzles. Besides, OmniGenome solves most of the puzzles within $1$ hour,
while the existing methods usually allocate $24$ hours for each puzzle.
Overall, OmniGenome establishes wide genomic application cases and offers
profound insights into biological mechanisms from the perspective of
sequence-structure alignment.