{"title":"DeepEpi: Deep Learning Model for Predicting Gene Expression Regulation based on Epigenetic Histone Modifications","authors":"Rania Hamdy, Yasser M. K. Omar, F. Maghraby","doi":"10.2174/1574893618666230818121046","DOIUrl":null,"url":null,"abstract":"\n\nHistone modification is a vital element in gene expression regulation. The way in which these proteins bind to the DNA impacts whether or not a gene may be expressed. Although those factors cannot in-fluence DNA construction, they can influence how it is transcribed.\n\n\n\nEach spatial location in DNA has its function, so the spatial ar-rangement of chromatin modifications affects how the gene can express. Al-so, gene regulation is affected by the type of histone modification combina-tions that are present on the gene and depends on the spatial distributional pattern of these modifications and how long these modifications read on a gene region. So, this study aims to know how to model Long-range spatial genome data and model complex dependencies among Histone reads.\n\n\n\nThe Convolution Neural Network (CNN) is used to model all da-ta features in this paper. It can detect patterns in histones signals and pre-serve the spatial information of these patterns. It also uses the concept of memory in long short-term memory (LSTM), using vanilla LSTM, Bi-Directional LSTM, or Stacked LSTM to preserve long-range histones sig-nals. Additionally, it tries to combine these methods using ConvLSTM or uses them together with the aid of a self-attention.\n\n\n\nBased on the results, the combination of CNN, LSTM with the self-attention mechanism obtained an Area under the Curve (AUC) score of 88.87% over 56 cell types.\n\n\n\nThe result outperforms the present state-of-the-art model and provides insight into how combinatorial interactions between histone modi-fication marks can control gene expression. The source code is available at https://github.com/RaniaHamdy/DeepEpi.\n","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1574893618666230818121046","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Histone modification is a vital element in gene expression regulation. The way in which these proteins bind to the DNA impacts whether or not a gene may be expressed. Although those factors cannot in-fluence DNA construction, they can influence how it is transcribed.
Each spatial location in DNA has its function, so the spatial ar-rangement of chromatin modifications affects how the gene can express. Al-so, gene regulation is affected by the type of histone modification combina-tions that are present on the gene and depends on the spatial distributional pattern of these modifications and how long these modifications read on a gene region. So, this study aims to know how to model Long-range spatial genome data and model complex dependencies among Histone reads.
The Convolution Neural Network (CNN) is used to model all da-ta features in this paper. It can detect patterns in histones signals and pre-serve the spatial information of these patterns. It also uses the concept of memory in long short-term memory (LSTM), using vanilla LSTM, Bi-Directional LSTM, or Stacked LSTM to preserve long-range histones sig-nals. Additionally, it tries to combine these methods using ConvLSTM or uses them together with the aid of a self-attention.
Based on the results, the combination of CNN, LSTM with the self-attention mechanism obtained an Area under the Curve (AUC) score of 88.87% over 56 cell types.
The result outperforms the present state-of-the-art model and provides insight into how combinatorial interactions between histone modi-fication marks can control gene expression. The source code is available at https://github.com/RaniaHamdy/DeepEpi.
组蛋白修饰是基因表达调控的重要组成部分。这些蛋白质与DNA结合的方式影响着基因是否可以表达。尽管这些因素不能影响DNA的构建,但它们可以影响DNA的转录方式。DNA中的每个空间位置都有其功能,因此染色质修饰的空间排列影响基因的表达方式。同样,基因调控受到存在于基因上的组蛋白修饰组合类型的影响,并取决于这些修饰的空间分布模式以及这些修饰在基因区域上读取的时间长短。因此,本研究旨在了解如何对远程空间基因组数据进行建模,并对组蛋白reads之间的复杂依赖关系进行建模。本文使用卷积神经网络(CNN)对所有数据特征进行建模。它可以检测组蛋白信号中的模式,并预先保存这些模式的空间信息。它还使用长短期记忆(LSTM)的概念,使用vanilla LSTM、双向LSTM或堆叠LSTM来保存远程组蛋白信号。此外,它尝试使用ConvLSTM将这些方法结合起来,或者在自我关注的帮助下将它们一起使用。结果表明,CNN、LSTM结合自注意机制在56种细胞类型中获得了88.87%的曲线下面积(Area under the Curve, AUC)得分。结果优于目前最先进的模型,并提供了洞察组蛋白修饰标记之间的组合相互作用如何控制基因表达。源代码可从https://github.com/RaniaHamdy/DeepEpi获得。
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.