Interpreting deep neural networks for the prediction of translation rates.

IF 3.5 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Frederick Korbel, Ekaterina Eroshok, Uwe Ohler
{"title":"Interpreting deep neural networks for the prediction of translation rates.","authors":"Frederick Korbel, Ekaterina Eroshok, Uwe Ohler","doi":"10.1186/s12864-024-10925-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR.</p><p><strong>Results: </strong>Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs.</p><p><strong>Conclusions: </strong>Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1061"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549864/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-024-10925-8","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR.

Results: Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs.

Conclusions: Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.

解读用于预测翻译率的深度神经网络。
背景:mRNA 的 5' 非翻译区对翻译启动的速度有很大影响。最近的一个卷积神经网络(CNN)模型准确量化了大规模平行合成的 5' 非翻译区(5'UTR)与翻译水平之间的关系。然而,驱动模型预测的潜在生物特征仍然难以捉摸。揭示预测翻译输出的序列决定因素可以让我们更详细地了解 5'UTR 的翻译调控:应用模型解释,我们从合成和人类 5'UTR 报告数据训练的 CNN 中提取了调控逻辑的表征。我们揭示了起始上下文和上游开放阅读框架(uORFs)等调控序列元素之间复杂的相互作用对模型预测的影响。我们发现,与天然 5'UTR 相比,由于调控基序的频率不同,仅在合成数据上训练的模型并不能充分解释通过 5'UTR 进行的翻译调控:我们的研究证明了模型解释在理解模型行为、实验数据属性以及最终 mRNA 翻译方面的重要性。通过结合合成和人类 5'UTR 报告数据,我们建立了一个能更好捕捉人类翻译调控特征的模型(OptMRL)。这种方法将随机序列的全局采样与自然发生序列的子空间相结合,为建立更成功的基于序列的基因调控模型提供了通用策略。最终,这将增强我们对疾病中 5'UTR 序列的理解,并提高我们设计翻译输出的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Genomics
BMC Genomics 生物-生物工程与应用微生物
CiteScore
7.40
自引率
4.50%
发文量
769
审稿时长
6.4 months
期刊介绍: BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信