Predicting the effect of CRISPR-Cas9-based epigenome editing.

bioRxiv : the preprint server for biology Pub Date : 2025-02-28 DOI:10.1101/2023.10.03.560674

Sanjit Singh Batra, Alan Cabrera, Jeffrey P Spence, Jacob Goell, Selvalakshmi S Anand, Isaac B Hilton, Yun S Song

{"title":"Predicting the effect of CRISPR-Cas9-based epigenome editing.","authors":"Sanjit Singh Batra, Alan Cabrera, Jeffrey P Spence, Jacob Goell, Selvalakshmi S Anand, Isaac B Hilton, Yun S Song","doi":"10.1101/2023.10.03.560674","DOIUrl":null,"url":null,"abstract":"<p><p>Epigenetic regulation orchestrates mammalian transcription, but functional links between them remain elusive. To tackle this problem, we use epigenomic and transcriptomic data from 13 ENCODE cell types to train machine learning models to predict gene expression from histone post-translational modifications (PTMs), achieving transcriptome-wide correlations of ~ 0.70 - 0.79 for most cell types. Our models recapitulate known associations between histone PTMs and expression patterns, including predicting that acetylation of histone subunit H3 lysine residue 27 (H3K27ac) near the transcription start site (TSS) significantly increases expression levels. To validate this prediction experimentally and investigate how natural vs. engineered deposition of H3K27ac might differentially affect expression, we apply the synthetic dCas9-p300 histone acetyltransferase system to 8 genes in the HEK293T cell line and to 5 genes in the K562 cell line. Further, to facilitate model building, we perform MNase-seq to map genome-wide nucleosome occupancy levels in HEK293T. We observe that our models perform well in accurately ranking relative fold-changes among genes in response to the dCas9-p300 system; however, their ability to rank fold-changes within individual genes is noticeably diminished compared to predicting expression across cell types from their native epigenetic signatures. Our findings highlight the need for more comprehensive genome-scale epigenome editing datasets, better understanding of the actual modifications made by epigenome editing tools, and improved causal models that transfer better from endogenous cellular measurements to perturbation experiments. Together these improvements would facilitate the ability to understand and predictably control the dynamic human epigenome with consequences for human health.</p>","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10592942/pdf/nihpp-2023.10.03.560674v1.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.10.03.560674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Epigenetic regulation orchestrates mammalian transcription, but functional links between them remain elusive. To tackle this problem, we use epigenomic and transcriptomic data from 13 ENCODE cell types to train machine learning models to predict gene expression from histone post-translational modifications (PTMs), achieving transcriptome-wide correlations of ~ 0.70 - 0.79 for most cell types. Our models recapitulate known associations between histone PTMs and expression patterns, including predicting that acetylation of histone subunit H3 lysine residue 27 (H3K27ac) near the transcription start site (TSS) significantly increases expression levels. To validate this prediction experimentally and investigate how natural vs. engineered deposition of H3K27ac might differentially affect expression, we apply the synthetic dCas9-p300 histone acetyltransferase system to 8 genes in the HEK293T cell line and to 5 genes in the K562 cell line. Further, to facilitate model building, we perform MNase-seq to map genome-wide nucleosome occupancy levels in HEK293T. We observe that our models perform well in accurately ranking relative fold-changes among genes in response to the dCas9-p300 system; however, their ability to rank fold-changes within individual genes is noticeably diminished compared to predicting expression across cell types from their native epigenetic signatures. Our findings highlight the need for more comprehensive genome-scale epigenome editing datasets, better understanding of the actual modifications made by epigenome editing tools, and improved causal models that transfer better from endogenous cellular measurements to perturbation experiments. Together these improvements would facilitate the ability to understand and predictably control the dynamic human epigenome with consequences for human health.

查看原文本刊更多论文

预测基于CRISPR-Cas9的表观基因组编辑的效果。

表观遗传学调控协调哺乳动物的转录，但它们之间的功能联系仍然难以捉摸。为了解决这个问题，我们在这里使用来自13种ENCODE细胞类型的表观基因组和转录组数据来训练机器学习模型，以预测组蛋白翻译后修饰（PTMs）的基因表达，对大多数样本实现了约0.70-0.79的转录组相关性。除了概括组蛋白PTM和表达模式之间的已知关联外，我们的模型预测，转录起始位点（TSS）附近的组蛋白亚基H3赖氨酸残基27（H3K27ac）的乙酰化显著增加了表达水平。为了通过实验验证这一预测，并研究H3K27ac的工程沉积与自然沉积如何不同地影响表达，我们将合成的dCas9-p300组蛋白乙酰转移酶系统应用于HEK293T细胞系中的8个基因。此外，为了促进模型构建，我们进行MNase-seq来绘制HEK293T中的全基因组核小体占有水平。我们观察到，我们的模型在准确排序基因对dCas9-p300系统的相对倍数变化方面表现良好；然而，与从其天然表观遗传学特征预测跨细胞类型的表达相比，它们对单个基因内倍数变化进行排序的能力明显减弱。我们的发现突出表明，需要更全面的基因组规模表观基因组编辑数据集，更好地了解表观基因组剪辑工具所做的实际修改，以及改进的因果模型，以便更好地从内源性细胞测量转移到扰动实验。这些改进加在一起将有助于理解和可预测地控制对人类健康产生影响的动态人类表观基因组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量