利用深度学习模型从DNA序列预测基因表达

IF 52 1区生物学 Q1 GENETICS & HEREDITY

Nature Reviews Genetics Pub Date : 2025-05-13 DOI:10.1038/s41576-025-00841-2

Lucía Barbadilla-Martínez, Noud Klaassen, Bas van Steensel, Jeroen de Ridder

{"title":"利用深度学习模型从DNA序列预测基因表达","authors":"Lucía Barbadilla-Martínez, Noud Klaassen, Bas van Steensel, Jeroen de Ridder","doi":"10.1038/s41576-025-00841-2","DOIUrl":null,"url":null,"abstract":"Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches. Barbadilla-Martínez et al. review recent progress in deep-learning-based sequence-to-expression models, which predict gene expression levels solely from DNA sequence. These models are providing new insights into the complex combinatorial logic underlying cis-regulatory control of gene expression.","PeriodicalId":19067,"journal":{"name":"Nature Reviews Genetics","volume":"26 10","pages":"666-680"},"PeriodicalIF":52.0000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting gene expression from DNA sequence using deep learning models\",\"authors\":\"Lucía Barbadilla-Martínez, Noud Klaassen, Bas van Steensel, Jeroen de Ridder\",\"doi\":\"10.1038/s41576-025-00841-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches. Barbadilla-Martínez et al. review recent progress in deep-learning-based sequence-to-expression models, which predict gene expression levels solely from DNA sequence. These models are providing new insights into the complex combinatorial logic underlying cis-regulatory control of gene expression.\",\"PeriodicalId\":19067,\"journal\":{\"name\":\"Nature Reviews Genetics\",\"volume\":\"26 10\",\"pages\":\"666-680\"},\"PeriodicalIF\":52.0000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Reviews Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.nature.com/articles/s41576-025-00841-2\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Reviews Genetics","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41576-025-00841-2","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

基因的转录受启动子和增强子等DNA元件的调控，而启动子和增强子的活性又受许多转录因子的控制。由于涉及高度复杂的组合逻辑，构建从DNA序列预测基因活性的计算模型一直很困难。最近，深度学习技术应用于表观基因组图谱和高通量报告分析的数据，在解决这种复杂性方面取得了实质性进展。这些模型可以非常准确地捕获调控语法，并在预测非编码变异的影响，揭示基因调控的详细分子机制和设计生物技术合成调控元件方面显示出很大的希望。在这里，我们讨论了这些方法的原理，可用的训练数据集的类型以及不同方法的优点和局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Predicting gene expression from DNA sequence using deep learning models

查看原文本刊更多论文

Predicting gene expression from DNA sequence using deep learning models

Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches. Barbadilla-Martínez et al. review recent progress in deep-learning-based sequence-to-expression models, which predict gene expression levels solely from DNA sequence. These models are providing new insights into the complex combinatorial logic underlying cis-regulatory control of gene expression.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Reviews Genetics 生物-遗传学

CiteScore

57.40

自引率

0.50%

发文量

113

审稿时长

6-12 weeks

期刊介绍： At Nature Reviews Genetics, our goal is to be the leading source of reviews and commentaries for the scientific communities we serve. We are dedicated to publishing authoritative articles that are easily accessible to our readers. We believe in enhancing our articles with clear and understandable figures, tables, and other display items. Our aim is to provide an unparalleled service to authors, referees, and readers, and we are committed to maximizing the usefulness and impact of each article we publish. Within our journal, we publish a range of content including Research Highlights, Comments, Reviews, and Perspectives that are relevant to geneticists and genomicists. With our broad scope, we ensure that the articles we publish reach the widest possible audience. As part of the Nature Reviews portfolio of journals, we strive to uphold the high standards and reputation associated with this esteemed collection of publications.