Discriminative discovery of transcription factor binding sites from location data.

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2005-01-01 DOI:10.1109/csb.2005.30

Yuji Kawada, Yasubumi Sakakibara

{"title":"Discriminative discovery of transcription factor binding sites from location data.","authors":"Yuji Kawada, Yasubumi Sakakibara","doi":"10.1109/csb.2005.30","DOIUrl":null,"url":null,"abstract":"Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations.Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"86-9"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.30","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csb.2005.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations.

Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.

查看原文本刊更多论文

从定位数据中鉴别发现转录因子结合位点。

动机:基于染色质免疫沉淀(ChIP)数据的全基因组定位分析的可用性为转录调控的计算机分析提供了新的见解。结果:我们提出了一个新的鉴别发现框架，用于基于全基因组定位数据精确识别阳性和阴性样本(转录因子(TF)结合和未结合基因的上游序列集)的转录调控基序。在这个框架中，我们的目标是找到这样的判别基序，在基序精确区分阳性样本和阴性样本的意义上，最好地解释位置数据。首先，为了在正样本和负样本之间发现一组初始的判别子串，我们采用决策树学习方法生成文本分类树。我们从学习树的内部节点中提取由相似子串组成的几个簇。其次，我们从每个聚类构建初始轮廓hmm开始，用于表示假定的基序，并迭代改进轮廓hmm以提高识别精度。我们在酵母上的全基因组实验结果表明，我们的方法成功地识别了文献中已知的tf的共识序列，并进一步在所有tf的阳性和阴性样本区分方面表现出显著的性能，而大多数其他基序检测方法在区分问题上表现得非常差。我们学习的侧写- hmm也改善了ChIP数据的假阴性预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computational Systems Bioinformatics Conference

自引率

0.00%

发文量