Chromatin signature analysis and prediction of genome-wide novel promoters using finite mixture model

C. Taslim, Shili Lin, Kun Huang, T. Huang
{"title":"Chromatin signature analysis and prediction of genome-wide novel promoters using finite mixture model","authors":"C. Taslim, Shili Lin, Kun Huang, T. Huang","doi":"10.1109/GENSiPS.2011.6169429","DOIUrl":null,"url":null,"abstract":"Regulation of gene expression has been shown to involve not only binding of transcription factor in target gene promoters but also characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to be associated with gene activation. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize chromatin signature and patterns of promoters, which are then used to predict novel (alternative) promoters. In this paper, chromatin immunoprecipitation methods followed by massive parallel sequencing (ChIP-seq) data against RNA Polymerase II (Pol II) and H3K4me2 are used to identify common patterns of promoter regions. These patterns were then used to search for similar patterns over the entire genome to find novel promoters. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm and RNA-seq data to identify novel promoters in the MCF7 cell line. We found 4,392 high-confidence regions that display the identified promoter patterns (referred to as putative novel promoters). Of these, 875 regions (20%) overlap with RNA transcripts. Around 70% of these putative novel promoters have overlapped with RNA transcripts, EST and/or non-coding RNA suggesting that these putative novel promoters might be promoters which are currently undiscovered.","PeriodicalId":181666,"journal":{"name":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSiPS.2011.6169429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Regulation of gene expression has been shown to involve not only binding of transcription factor in target gene promoters but also characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to be associated with gene activation. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize chromatin signature and patterns of promoters, which are then used to predict novel (alternative) promoters. In this paper, chromatin immunoprecipitation methods followed by massive parallel sequencing (ChIP-seq) data against RNA Polymerase II (Pol II) and H3K4me2 are used to identify common patterns of promoter regions. These patterns were then used to search for similar patterns over the entire genome to find novel promoters. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm and RNA-seq data to identify novel promoters in the MCF7 cell line. We found 4,392 high-confidence regions that display the identified promoter patterns (referred to as putative novel promoters). Of these, 875 regions (20%) overlap with RNA transcripts. Around 70% of these putative novel promoters have overlapped with RNA transcripts, EST and/or non-coding RNA suggesting that these putative novel promoters might be promoters which are currently undiscovered.
基于有限混合模型的全基因组新型启动子染色质特征分析与预测
基因表达的调控不仅涉及靶基因启动子中转录因子的结合,还涉及DNA包裹的组蛋白的表征。一些组蛋白修饰,例如赖氨酸4 (H3K4me2)上的组蛋白H3二甲基化,已被证明与基因激活有关。然而,没有明确的模式可以预测人类的启动子。本文提出了一种新的定量方法来表征启动子的染色质特征和模式,然后用于预测新的(替代)启动子。本文采用染色质免疫沉淀方法,然后采用针对RNA聚合酶II (Pol II)和H3K4me2的大量平行测序(ChIP-seq)数据来鉴定启动子区域的共同模式。然后利用这些模式在整个基因组中寻找相似的模式,以发现新的启动子。共同模式的启动子区域使用混合模型涉及双指数和均匀分布建模。与共同模式高度相关的区域被确定为假定的新启动子。我们使用该算法和RNA-seq数据来鉴定MCF7细胞系中的新启动子。我们发现4,392个高可信度区域显示已确定的启动子模式(称为假定的新启动子)。其中,875个区域(20%)与RNA转录物重叠。这些假定的新型启动子中约有70%与RNA转录本、EST和/或非编码RNA重叠,这表明这些假定的新型启动子可能是目前尚未发现的启动子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信