Lucía Barbadilla-Martínez, Noud Klaassen, Bas van Steensel, Jeroen de Ridder
{"title":"利用深度学习模型从DNA序列预测基因表达","authors":"Lucía Barbadilla-Martínez, Noud Klaassen, Bas van Steensel, Jeroen de Ridder","doi":"10.1038/s41576-025-00841-2","DOIUrl":null,"url":null,"abstract":"<p>Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches.</p>","PeriodicalId":19067,"journal":{"name":"Nature Reviews Genetics","volume":"13 1","pages":""},"PeriodicalIF":39.1000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting gene expression from DNA sequence using deep learning models\",\"authors\":\"Lucía Barbadilla-Martínez, Noud Klaassen, Bas van Steensel, Jeroen de Ridder\",\"doi\":\"10.1038/s41576-025-00841-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches.</p>\",\"PeriodicalId\":19067,\"journal\":{\"name\":\"Nature Reviews Genetics\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":39.1000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Reviews Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1038/s41576-025-00841-2\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Reviews Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41576-025-00841-2","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Predicting gene expression from DNA sequence using deep learning models
Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches.
期刊介绍:
At Nature Reviews Genetics, our goal is to be the leading source of reviews and commentaries for the scientific communities we serve. We are dedicated to publishing authoritative articles that are easily accessible to our readers. We believe in enhancing our articles with clear and understandable figures, tables, and other display items. Our aim is to provide an unparalleled service to authors, referees, and readers, and we are committed to maximizing the usefulness and impact of each article we publish.
Within our journal, we publish a range of content including Research Highlights, Comments, Reviews, and Perspectives that are relevant to geneticists and genomicists. With our broad scope, we ensure that the articles we publish reach the widest possible audience.
As part of the Nature Reviews portfolio of journals, we strive to uphold the high standards and reputation associated with this esteemed collection of publications.