Discriminative discovery of transcription factor binding sites from location data.

Yuji Kawada, Yasubumi Sakakibara
{"title":"Discriminative discovery of transcription factor binding sites from location data.","authors":"Yuji Kawada,&nbsp;Yasubumi Sakakibara","doi":"10.1109/csb.2005.30","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations.</p><p><strong>Results: </strong>We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"86-9"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.30","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csb.2005.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations.

Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.

从定位数据中鉴别发现转录因子结合位点。
动机:基于染色质免疫沉淀(ChIP)数据的全基因组定位分析的可用性为转录调控的计算机分析提供了新的见解。结果:我们提出了一个新的鉴别发现框架,用于基于全基因组定位数据精确识别阳性和阴性样本(转录因子(TF)结合和未结合基因的上游序列集)的转录调控基序。在这个框架中,我们的目标是找到这样的判别基序,在基序精确区分阳性样本和阴性样本的意义上,最好地解释位置数据。首先,为了在正样本和负样本之间发现一组初始的判别子串,我们采用决策树学习方法生成文本分类树。我们从学习树的内部节点中提取由相似子串组成的几个簇。其次,我们从每个聚类构建初始轮廓hmm开始,用于表示假定的基序,并迭代改进轮廓hmm以提高识别精度。我们在酵母上的全基因组实验结果表明,我们的方法成功地识别了文献中已知的tf的共识序列,并进一步在所有tf的阳性和阴性样本区分方面表现出显著的性能,而大多数其他基序检测方法在区分问题上表现得非常差。我们学习的侧写- hmm也改善了ChIP数据的假阴性预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信