A structure-based approach to predicting in vitro transcription factor-DNA interaction

2013 IEEE International Workshop on Genomic Signal Processing and Statistics Pub Date : 2013-11-01 DOI:10.1109/GENSIPS.2013.6735915

Zhenzhu Gao, Jianhua Ruan

{"title":"A structure-based approach to predicting in vitro transcription factor-DNA interaction","authors":"Zhenzhu Gao, Jianhua Ruan","doi":"10.1109/GENSIPS.2013.6735915","DOIUrl":null,"url":null,"abstract":"Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2013.6735915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Summary form only given. Understanding the mechanism of transcriptional regulation remains to be an inspiring stage of molecular biology. Within the popular methods for modeling TFBS, position-specific weight matrix and k-mer based approaches have gained great success. However, both approaches fail to consider the structural properties of a binding site. Recently, a novel TFBS modeling and predicting approach is presented by Bauer et al. (2010), where the sequence-specific chemical and structural features of DNA are applied. However, the in vivo protein-DNA interactions observed in ChIP-chip assays, which were used in this study, are not necessarily direct, as some TFs tend to interact with DNAs extensively through other partners. Therefore, an evaluation on a proper in vitro dataset would be more appropriate to reveal the benefit of such physicochemical features in modeling TF-DNA interactions. Recently, in vitro protein-binding microarray experiment has greatly improved the understanding of transcription factor-DNA interaction. It is a high-throughput experiment used to measure the in vitro binding affinity of a given TF to the sequences on the probe array. Because typical confounding factors such as transcription co-factors present in ChIP-based experiments are eliminated, PBM data provide an excellent information source to develop structural models for TF-DNA interactions. On the other hand, directly mapping of the 3-mer or 4-mer based meta-features to the candidate DNA binding sequences as in their work may not reflect the TF-DNA binding nature, since a TFBS is usually an 8 to 12 base-pair. As a result, conventionally machine-learning algorithms, which rely on well-structured feature vector and label pairs, may not work well in modeling PBM data. In this paper we propose a novel approach to predicts in vitro transcription factor binding based on the structural properties of DNA using a so-called multiple-instance learning algorithm. Compared to conventional (single-instance based) learning algorithms, our multi-instance learning-based algorithm does not require the knowledge of the actual binding site within a candidate probe sequence, yet can still take full advantage of the physicochemical properties in modeling and predicting TF-DNA interactions. Evaluation on an in vitro protein binding microarray data of twenty mouse TFs shows that our new model performs significantly better than several k-mer or structure-based single-instance learning algorithms. It indicates that combining multi-instance learning and structural properties of DNA has promising potential for studying biological regulatory networks.

查看原文本刊更多论文

基于结构的体外转录因子- dna相互作用预测方法

只提供摘要形式。理解转录调控的机制仍然是分子生物学的一个鼓舞人心的阶段。在目前流行的TFBS建模方法中，位置特定权重矩阵和基于k-mer的方法取得了很大的成功。然而，这两种方法都没有考虑到结合位点的结构特性。最近，Bauer等人(2010)提出了一种新的TFBS建模和预测方法，其中应用了DNA的序列特异性化学和结构特征。然而，在本研究中使用的ChIP-chip试验中观察到的体内蛋白质- dna相互作用并不一定是直接的，因为一些tf倾向于通过其他伙伴广泛地与dna相互作用。因此，对适当的体外数据集进行评估将更适合揭示此类物理化学特征在模拟TF-DNA相互作用中的益处。近年来，体外蛋白结合微阵列实验极大地提高了对转录因子- dna相互作用的认识。这是一种高通量实验，用于测量给定TF与探针阵列上序列的体外结合亲和力。由于消除了基于芯片的实验中存在的转录辅助因子等典型混淆因素，PBM数据为开发TF-DNA相互作用的结构模型提供了极好的信息源。另一方面，直接将3-聚体或4-聚体的元特征映射到候选DNA结合序列可能不能反映TF-DNA结合的性质，因为TFBS通常是8到12个碱基对。因此，传统的机器学习算法依赖于结构良好的特征向量和标签对，可能无法很好地建模PBM数据。在本文中，我们提出了一种新的方法来预测体外转录因子结合基于DNA的结构特性，使用所谓的多实例学习算法。与传统的(基于单实例的)学习算法相比，我们的基于多实例学习的算法不需要了解候选探针序列中实际结合位点的知识，但仍然可以充分利用建模和预测TF-DNA相互作用的物理化学性质。对20个小鼠tf的体外蛋白结合微阵列数据的评估表明，我们的新模型明显优于几种k-mer或基于结构的单实例学习算法。这表明将多实例学习与DNA结构特性相结合在生物调控网络研究中具有广阔的应用前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Workshop on Genomic Signal Processing and Statistics

自引率

0.00%

发文量