通过人鼠序列比较粉碎DNA中的调控位点。

Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2003-01-01

Mihaela Zavolan, Nicholas D Socci, Nikolaus Rajewsky, Terry Gaasterlamd

{"title":"通过人鼠序列比较粉碎DNA中的调控位点。","authors":"Mihaela Zavolan, Nicholas D Socci, Nikolaus Rajewsky, Terry Gaasterlamd","doi":"","DOIUrl":null,"url":null,"abstract":"Regulatory sequence elements provide important clues to understanding and predicting gene expression. Although the binding sites for hundreds of transcription factors are known, there has been no systematic attempt to incorporate this information in the annotation of the human genome. Cross species sequence comparisons are critical to a meaningful annotation of regulatory elements since they generally reside in conserved non-coding regions. To take advantage of the recently completed drafts of the mouse and human genomes for annotating transcription factor binding sites, we developed SMASH, a computational pipeline that identifies thousands of orthologous human/ mouse proteins, maps them to genomic sequences, extracts and compares upstream regions and annotates putative regulatory elements in conserved, non-coding, upstream regions. Our current dataset consists of approximately 2,500 human/mouse gene pairs. Transcription start sites were estimated by mapping quasi-full length cDNA sequences. SMASH uses a novel probabilistic method to identify putative conserved binding sites that takes into account the competition between transcription factors for binding DNA. SMASH presents the results via a genome browser web interface which displays the predicted regulatory information together with the current annotations for the human genome. Our results are validated by comparison to previously published experimental data. SMASH results compare favorably to other existing computational approaches.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"277-86"},"PeriodicalIF":0.0000,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SMASHing regulatory sites in DNA by human-mouse sequence comparisons.\",\"authors\":\"Mihaela Zavolan, Nicholas D Socci, Nikolaus Rajewsky, Terry Gaasterlamd\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regulatory sequence elements provide important clues to understanding and predicting gene expression. Although the binding sites for hundreds of transcription factors are known, there has been no systematic attempt to incorporate this information in the annotation of the human genome. Cross species sequence comparisons are critical to a meaningful annotation of regulatory elements since they generally reside in conserved non-coding regions. To take advantage of the recently completed drafts of the mouse and human genomes for annotating transcription factor binding sites, we developed SMASH, a computational pipeline that identifies thousands of orthologous human/ mouse proteins, maps them to genomic sequences, extracts and compares upstream regions and annotates putative regulatory elements in conserved, non-coding, upstream regions. Our current dataset consists of approximately 2,500 human/mouse gene pairs. Transcription start sites were estimated by mapping quasi-full length cDNA sequences. SMASH uses a novel probabilistic method to identify putative conserved binding sites that takes into account the competition between transcription factors for binding DNA. SMASH presents the results via a genome browser web interface which displays the predicted regulatory information together with the current annotations for the human genome. Our results are validated by comparison to previously published experimental data. SMASH results compare favorably to other existing computational approaches.\",\"PeriodicalId\":87204,\"journal\":{\"name\":\"Proceedings. IEEE Computer Society Bioinformatics Conference\",\"volume\":\"2 \",\"pages\":\"277-86\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE Computer Society Bioinformatics Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

调控序列元件为理解和预测基因表达提供了重要线索。虽然已知数百种转录因子的结合位点，但还没有系统地尝试将这些信息纳入人类基因组的注释中。由于调控元件通常位于保守的非编码区，因此跨物种序列比较对于有意义的调控元件注释至关重要。为了利用最近完成的小鼠和人类基因组草图来注释转录因子结合位点，我们开发了SMASH，这是一个计算管道，可以识别数千个同源人/小鼠蛋白质，将它们映射到基因组序列，提取和比较上游区域，并在保守的非编码的上游区域注释推定的调控元件。我们目前的数据集包括大约2500对人类/小鼠基因对。通过拟全长cDNA序列的定位估计转录起始位点。SMASH使用一种新颖的概率方法来识别假定的保守结合位点，该方法考虑了转录因子之间结合DNA的竞争。SMASH通过基因组浏览器web界面呈现结果，该界面显示预测的调控信息以及人类基因组的当前注释。我们的结果通过与先前发表的实验数据的比较得到了验证。SMASH结果优于其他现有的计算方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

SMASHing regulatory sites in DNA by human-mouse sequence comparisons.

Regulatory sequence elements provide important clues to understanding and predicting gene expression. Although the binding sites for hundreds of transcription factors are known, there has been no systematic attempt to incorporate this information in the annotation of the human genome. Cross species sequence comparisons are critical to a meaningful annotation of regulatory elements since they generally reside in conserved non-coding regions. To take advantage of the recently completed drafts of the mouse and human genomes for annotating transcription factor binding sites, we developed SMASH, a computational pipeline that identifies thousands of orthologous human/ mouse proteins, maps them to genomic sequences, extracts and compares upstream regions and annotates putative regulatory elements in conserved, non-coding, upstream regions. Our current dataset consists of approximately 2,500 human/mouse gene pairs. Transcription start sites were estimated by mapping quasi-full length cDNA sequences. SMASH uses a novel probabilistic method to identify putative conserved binding sites that takes into account the competition between transcription factors for binding DNA. SMASH presents the results via a genome browser web interface which displays the predicted regulatory information together with the current annotations for the human genome. Our results are validated by comparison to previously published experimental data. SMASH results compare favorably to other existing computational approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE Computer Society Bioinformatics Conference

自引率

0.00%

发文量