Edahi Gonzalez-Avalos, Atsushi Onodera, Daniela Samaniego-Castruita, Anjana Rao, Ferhat Ay
{"title":"Predicting gene expression state and prioritizing putative enhancers using 5hmC signal.","authors":"Edahi Gonzalez-Avalos, Atsushi Onodera, Daniela Samaniego-Castruita, Anjana Rao, Ferhat Ay","doi":"10.1186/s13059-024-03273-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Like its parent base 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC) is a direct epigenetic modification of cytosines in the context of CpG dinucleotides. 5hmC is the most abundant oxidized form of 5mC, generated through the action of TET dioxygenases at gene bodies of actively-transcribed genes and at active or lineage-specific enhancers. Although such enrichments are reported for 5hmC, to date, predictive models of gene expression state or putative regulatory regions for genes using 5hmC have not been developed.</p><p><strong>Results: </strong>Here, by using only 5hmC enrichment in genic regions and their vicinity, we develop neural network models that predict gene expression state across 49 cell types. We show that our deep neural network models distinguish high vs low expression state utilizing only 5hmC levels and these predictive models generalize to unseen cell types. Further, in order to leverage 5hmC signal in distal enhancers for expression prediction, we employ an Activity-by-Contact model and also develop a graph convolutional neural network model with both utilizing Hi-C data and 5hmC enrichment to prioritize enhancer-promoter links. These approaches identify known and novel putative enhancers for key genes in multiple immune cell subsets.</p><p><strong>Conclusions: </strong>Our work highlights the importance of 5hmC in gene regulation through proximal and distal mechanisms and provides a framework to link it to genome function. With the recent advances in 6-letter DNA sequencing by short and long-read techniques, profiling of 5mC and 5hmC may be done routinely in the near future, hence, providing a broad range of applications for the methods developed here.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":10.1000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11145787/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-024-03273-z","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Like its parent base 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC) is a direct epigenetic modification of cytosines in the context of CpG dinucleotides. 5hmC is the most abundant oxidized form of 5mC, generated through the action of TET dioxygenases at gene bodies of actively-transcribed genes and at active or lineage-specific enhancers. Although such enrichments are reported for 5hmC, to date, predictive models of gene expression state or putative regulatory regions for genes using 5hmC have not been developed.
Results: Here, by using only 5hmC enrichment in genic regions and their vicinity, we develop neural network models that predict gene expression state across 49 cell types. We show that our deep neural network models distinguish high vs low expression state utilizing only 5hmC levels and these predictive models generalize to unseen cell types. Further, in order to leverage 5hmC signal in distal enhancers for expression prediction, we employ an Activity-by-Contact model and also develop a graph convolutional neural network model with both utilizing Hi-C data and 5hmC enrichment to prioritize enhancer-promoter links. These approaches identify known and novel putative enhancers for key genes in multiple immune cell subsets.
Conclusions: Our work highlights the importance of 5hmC in gene regulation through proximal and distal mechanisms and provides a framework to link it to genome function. With the recent advances in 6-letter DNA sequencing by short and long-read techniques, profiling of 5mC and 5hmC may be done routinely in the near future, hence, providing a broad range of applications for the methods developed here.
Genome BiologyBiochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍:
Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens.
With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category.
Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.