{"title":"Discriminative Detection of Cis-Acting Regulatory Variation From Location Data","authors":"Yu Kawada, Y. Sakakibara","doi":"10.1142/9781860947292_0012","DOIUrl":null,"url":null,"abstract":"The interaction between transcription factors and their DNA binding sites plays a key role for understanding gene regulation mechanisms. Recent studies revealed the presence of ifunctional polymorphismi in genes that is dened as regulatory variation measured in transcription levels due to the cis-acting sequence differences. These regulatory variants are assumed to contribute to modulating gene functions. However, computational identica tions of such functional cis-regulatory variants is a much greater challenge than just identifying consensus sequences, because cis-regulatory variants differ by only a few bases from the main consensus sequences, while they have important consequences for organismal phenotype. None of the previous studies have directly addressed this problem. We propose a novel discriminative detection method for precisely identifying transcription factor binding sites and their functional variants from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor) based on the genome-wide location data. Our goal is to nd such discriminative substrings that best explain the location data in the sense that the substrings precisely discriminate the positive samples from the negative ones rather than nding the substrings that are simply over-represented among the positive ones. Our method consists of two steps: First, we apply a decision tree learning method to discover discriminative substrings and a hierarchical relationship among them. Second, we extract a main motif and further a second motif as a cis-regulatory variant by utilizing functional annotations. Our genome-wide experimental results on yeast Saccharomyces cerevisiae show that our method presented signicantly better performances for detecting experimentally veried consensus sequences than current motif detecting methods. In addition, our method has successfully discovered second motifs of putative functional cis-regulatory variants which are associated with genes of different functional annotations, and the correctness of those variants have been veried by expression prole analyses.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"172 ","pages":"89-98"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947292_0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The interaction between transcription factors and their DNA binding sites plays a key role for understanding gene regulation mechanisms. Recent studies revealed the presence of ifunctional polymorphismi in genes that is dened as regulatory variation measured in transcription levels due to the cis-acting sequence differences. These regulatory variants are assumed to contribute to modulating gene functions. However, computational identica tions of such functional cis-regulatory variants is a much greater challenge than just identifying consensus sequences, because cis-regulatory variants differ by only a few bases from the main consensus sequences, while they have important consequences for organismal phenotype. None of the previous studies have directly addressed this problem. We propose a novel discriminative detection method for precisely identifying transcription factor binding sites and their functional variants from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor) based on the genome-wide location data. Our goal is to nd such discriminative substrings that best explain the location data in the sense that the substrings precisely discriminate the positive samples from the negative ones rather than nding the substrings that are simply over-represented among the positive ones. Our method consists of two steps: First, we apply a decision tree learning method to discover discriminative substrings and a hierarchical relationship among them. Second, we extract a main motif and further a second motif as a cis-regulatory variant by utilizing functional annotations. Our genome-wide experimental results on yeast Saccharomyces cerevisiae show that our method presented signicantly better performances for detecting experimentally veried consensus sequences than current motif detecting methods. In addition, our method has successfully discovered second motifs of putative functional cis-regulatory variants which are associated with genes of different functional annotations, and the correctness of those variants have been veried by expression prole analyses.