Qin Zhou, Jose Alberto de la Paz, Alexander D Stanowick, Xingcheng Lin, Faruck Morcos
{"title":"利用全局偶联和高通量测序表征转录因子的DNA识别偏好。","authors":"Qin Zhou, Jose Alberto de la Paz, Alexander D Stanowick, Xingcheng Lin, Faruck Morcos","doi":"10.1093/nar/gkaf592","DOIUrl":null,"url":null,"abstract":"<p><p>DNA-transcription factor (TF) interactions are essential for gene regulation. Fully characterizing TF recognition specificities and identifying their genomic binding targets are important to understand TF function and regulatory networks. Recently, high-throughput sequencing technology HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment) has been used to measure hundreds of TFs, providing massive datasets that comprise TF binding preferences. However, there is a need to develop comprehensive computational modeling to fully extract and characterize critical TF binding preferences and fail to distinguish genome-wide binding targets. In this study, we developed a global pairwise model called DCA-Scapes trained with experimental HT-SELEX data. Our approach uncovered high-resolution TF recognition specificity landscapes, enabled the prediction of in vivo binding sequences, and was validated with ChIP-seq (ChIP sequencing) data. In addition, the DCA-Scapes model was utilized to refine the locations of binding regions and accurately identify the binding sites within the ChIP-seq enriched peaks. Moreover, we extended our model to cover the entire human genome, uncovering potential TF target sites that exhibit tissue-specific TF recognition across various cellular environments.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 12","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12214022/pdf/","citationCount":"0","resultStr":"{\"title\":\"Characterizing DNA recognition preferences of transcription factors using global couplings and high-throughput sequencing.\",\"authors\":\"Qin Zhou, Jose Alberto de la Paz, Alexander D Stanowick, Xingcheng Lin, Faruck Morcos\",\"doi\":\"10.1093/nar/gkaf592\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>DNA-transcription factor (TF) interactions are essential for gene regulation. Fully characterizing TF recognition specificities and identifying their genomic binding targets are important to understand TF function and regulatory networks. Recently, high-throughput sequencing technology HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment) has been used to measure hundreds of TFs, providing massive datasets that comprise TF binding preferences. However, there is a need to develop comprehensive computational modeling to fully extract and characterize critical TF binding preferences and fail to distinguish genome-wide binding targets. In this study, we developed a global pairwise model called DCA-Scapes trained with experimental HT-SELEX data. Our approach uncovered high-resolution TF recognition specificity landscapes, enabled the prediction of in vivo binding sequences, and was validated with ChIP-seq (ChIP sequencing) data. In addition, the DCA-Scapes model was utilized to refine the locations of binding regions and accurately identify the binding sites within the ChIP-seq enriched peaks. Moreover, we extended our model to cover the entire human genome, uncovering potential TF target sites that exhibit tissue-specific TF recognition across various cellular environments.</p>\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":\"53 12\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12214022/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkaf592\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf592","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
dna -转录因子(TF)相互作用对基因调控至关重要。充分表征TF识别特异性并确定其基因组结合靶点对于了解TF功能和调控网络至关重要。最近,高通量测序技术HT-SELEX(通过指数富集的高通量配体系统进化)已被用于测量数百个TF,提供包含TF结合偏好的大量数据集。然而,需要开发全面的计算模型来充分提取和表征关键的TF结合偏好,并且无法区分全基因组的结合靶点。在这项研究中,我们开发了一个名为DCA-Scapes的全球两两模型,该模型使用实验HT-SELEX数据进行训练。我们的方法揭示了高分辨率TF识别特异性景观,能够预测体内结合序列,并通过ChIP-seq (ChIP测序)数据进行了验证。此外,利用dca - scape模型细化结合区域的位置,准确识别ChIP-seq富集峰内的结合位点。此外,我们将我们的模型扩展到覆盖整个人类基因组,揭示了在各种细胞环境中表现出组织特异性TF识别的潜在TF靶点。
Characterizing DNA recognition preferences of transcription factors using global couplings and high-throughput sequencing.
DNA-transcription factor (TF) interactions are essential for gene regulation. Fully characterizing TF recognition specificities and identifying their genomic binding targets are important to understand TF function and regulatory networks. Recently, high-throughput sequencing technology HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment) has been used to measure hundreds of TFs, providing massive datasets that comprise TF binding preferences. However, there is a need to develop comprehensive computational modeling to fully extract and characterize critical TF binding preferences and fail to distinguish genome-wide binding targets. In this study, we developed a global pairwise model called DCA-Scapes trained with experimental HT-SELEX data. Our approach uncovered high-resolution TF recognition specificity landscapes, enabled the prediction of in vivo binding sequences, and was validated with ChIP-seq (ChIP sequencing) data. In addition, the DCA-Scapes model was utilized to refine the locations of binding regions and accurately identify the binding sites within the ChIP-seq enriched peaks. Moreover, we extended our model to cover the entire human genome, uncovering potential TF target sites that exhibit tissue-specific TF recognition across various cellular environments.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.