{"title":"利用数百万个酵母启动子预测基因表达揭示了顺式调控逻辑。","authors":"Tirtharaj Dash, Susanne Bornelöv","doi":"10.1093/bioadv/vbaf130","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Gene regulation involves complex interactions between transcription factors. While early attempts to predict gene expression were trained using naturally occurring promoters, gigantic parallel reporter assays have vastly expanded potential training data. Despite this, it is still unclear how to best use deep learning to study gene regulation. Here, we investigate the association between promoters and expression using Camformer, a residual convolutional neural network that ranked fourth in the Random Promoter DREAM Challenge 2022. We present the original model trained on 6.7 million sequences and investigate 270 alternative models to find determinants of model performance. Finally, we use explainable AI to uncover regulatory signals.</p><p><strong>Results: </strong>Camformer accurately decodes the association between promoters and gene expression ( <math> <mrow> <mrow> <msup><mrow><mi>r</mi></mrow> <mn>2</mn></msup> </mrow> <mo>=</mo> <mn>0.914</mn> <mo> ± </mo> <mn>0.003</mn></mrow> </math> , <math><mrow><mi>ρ</mi> <mo>=</mo> <mn>0.962</mn> <mo> ± </mo> <mn>0.002</mn></mrow> </math> ) and provides a substantial improvement over previous state of the art. Using Grad-CAM and in silico mutagenesis, we demonstrate that our model learns both individual motifs and their hierarchy. For example, while an IME1 motif on its own increases gene expression, a co-occurring UME6 motif instead strongly reduces gene expression. Thus, deep learning models such as Camformer can provide detailed insights into <i>cis</i>-regulatory logic.</p><p><strong>Availability and implementation: </strong>Data and code are available at: https://github.com/Bornelov-lab/Camformer.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf130"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12188188/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting gene expression using millions of yeast promoters reveals <i>cis</i>-regulatory logic.\",\"authors\":\"Tirtharaj Dash, Susanne Bornelöv\",\"doi\":\"10.1093/bioadv/vbaf130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Gene regulation involves complex interactions between transcription factors. While early attempts to predict gene expression were trained using naturally occurring promoters, gigantic parallel reporter assays have vastly expanded potential training data. Despite this, it is still unclear how to best use deep learning to study gene regulation. Here, we investigate the association between promoters and expression using Camformer, a residual convolutional neural network that ranked fourth in the Random Promoter DREAM Challenge 2022. We present the original model trained on 6.7 million sequences and investigate 270 alternative models to find determinants of model performance. Finally, we use explainable AI to uncover regulatory signals.</p><p><strong>Results: </strong>Camformer accurately decodes the association between promoters and gene expression ( <math> <mrow> <mrow> <msup><mrow><mi>r</mi></mrow> <mn>2</mn></msup> </mrow> <mo>=</mo> <mn>0.914</mn> <mo> ± </mo> <mn>0.003</mn></mrow> </math> , <math><mrow><mi>ρ</mi> <mo>=</mo> <mn>0.962</mn> <mo> ± </mo> <mn>0.002</mn></mrow> </math> ) and provides a substantial improvement over previous state of the art. Using Grad-CAM and in silico mutagenesis, we demonstrate that our model learns both individual motifs and their hierarchy. For example, while an IME1 motif on its own increases gene expression, a co-occurring UME6 motif instead strongly reduces gene expression. Thus, deep learning models such as Camformer can provide detailed insights into <i>cis</i>-regulatory logic.</p><p><strong>Availability and implementation: </strong>Data and code are available at: https://github.com/Bornelov-lab/Camformer.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf130\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12188188/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic.
Motivation: Gene regulation involves complex interactions between transcription factors. While early attempts to predict gene expression were trained using naturally occurring promoters, gigantic parallel reporter assays have vastly expanded potential training data. Despite this, it is still unclear how to best use deep learning to study gene regulation. Here, we investigate the association between promoters and expression using Camformer, a residual convolutional neural network that ranked fourth in the Random Promoter DREAM Challenge 2022. We present the original model trained on 6.7 million sequences and investigate 270 alternative models to find determinants of model performance. Finally, we use explainable AI to uncover regulatory signals.
Results: Camformer accurately decodes the association between promoters and gene expression ( , ) and provides a substantial improvement over previous state of the art. Using Grad-CAM and in silico mutagenesis, we demonstrate that our model learns both individual motifs and their hierarchy. For example, while an IME1 motif on its own increases gene expression, a co-occurring UME6 motif instead strongly reduces gene expression. Thus, deep learning models such as Camformer can provide detailed insights into cis-regulatory logic.
Availability and implementation: Data and code are available at: https://github.com/Bornelov-lab/Camformer.