TF-loop: deciphering the transcription factor regulatory language for CTCF-mediated chromatin loop based on BERT.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2026-03-01 DOI:10.1093/bib/bbag162

Yi-Xuan Qi, Hao-Jiang Zhang, Hao-Xiang Tang, Zi-Xuan Zhang, Kai-Yuan Han, Zheng Zhang, Hui Ding, Li Liu, You-Yu Wang

{"title":"TF-loop: deciphering the transcription factor regulatory language for CTCF-mediated chromatin loop based on BERT.","authors":"Yi-Xuan Qi, Hao-Jiang Zhang, Hao-Xiang Tang, Zi-Xuan Zhang, Kai-Yuan Han, Zheng Zhang, Hui Ding, Li Liu, You-Yu Wang","doi":"10.1093/bib/bbag162","DOIUrl":null,"url":null,"abstract":"<p><p>Chromatin looping, which facilitates the three-dimensional (3D) organization of the genome, is essential for the regulation of gene expression. This process relies on the interaction of numerous transcription factors (TFs), particularly CCCTC-binding factor (CTCF) and Cohesin, whose dynamic binding patterns orchestrate loop formation. Current computational methods for prediction of CTCF-mediated chromatin loops struggle to perform genome-wide predictions, primarily due to the extreme imbalance between positive and negative samples in training datasets. Existing DNA-sequence-based models often fail to capture the complex dynamics of TF binding and the regulatory code behind chromatin looping. To address these challenges, we present TF-loop, a novel TF regulatory language framework designed to predict chromatin loops. This framework conceptualizes TF sequences, defined by the binding positions and orientations of five key TFs, as a structured \"TF language.\" Using the BERT model, TF-loop decodes the latent linguistic patterns embedded in these sequences, facilitating accurate predictions of chromatin loops. Comparative analysis with state-of-the-art model demonstrates that TF-loop significantly improves prediction accuracy across diverse cell types, even when faced with highly imbalanced datasets. The results highlight the potential of TF-loop to offer a new perspective on decoding the 3D structure of chromatin using natural language processing techniques.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13076942/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbag162","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Chromatin looping, which facilitates the three-dimensional (3D) organization of the genome, is essential for the regulation of gene expression. This process relies on the interaction of numerous transcription factors (TFs), particularly CCCTC-binding factor (CTCF) and Cohesin, whose dynamic binding patterns orchestrate loop formation. Current computational methods for prediction of CTCF-mediated chromatin loops struggle to perform genome-wide predictions, primarily due to the extreme imbalance between positive and negative samples in training datasets. Existing DNA-sequence-based models often fail to capture the complex dynamics of TF binding and the regulatory code behind chromatin looping. To address these challenges, we present TF-loop, a novel TF regulatory language framework designed to predict chromatin loops. This framework conceptualizes TF sequences, defined by the binding positions and orientations of five key TFs, as a structured "TF language." Using the BERT model, TF-loop decodes the latent linguistic patterns embedded in these sequences, facilitating accurate predictions of chromatin loops. Comparative analysis with state-of-the-art model demonstrates that TF-loop significantly improves prediction accuracy across diverse cell types, even when faced with highly imbalanced datasets. The results highlight the potential of TF-loop to offer a new perspective on decoding the 3D structure of chromatin using natural language processing techniques.

查看原文本刊更多论文

tf环：基于BERT解读ctcf介导的染色质环转录因子调控语言。

染色质环，促进了基因组的三维（3D）组织，对基因表达的调节至关重要。这一过程依赖于许多转录因子（TFs）的相互作用，特别是ccctc结合因子（CTCF）和内聚蛋白，它们的动态结合模式协调了环的形成。目前用于预测ctcf介导的染色质环的计算方法难以进行全基因组预测，主要是由于训练数据集中阳性和阴性样本之间的极度不平衡。现有的基于dna序列的模型往往不能捕捉到TF结合的复杂动力学和染色质环背后的调控代码。为了解决这些挑战，我们提出了TF-loop，一种新的TF调节语言框架，旨在预测染色质环。该框架将由五个关键TF的结合位置和方向定义的TF序列概念化为结构化的“TF语言”。使用BERT模型，TF-loop解码嵌入在这些序列中的潜在语言模式，促进对染色质环的准确预测。与最先进模型的比较分析表明，TF-loop显着提高了不同细胞类型的预测精度，即使面对高度不平衡的数据集。这些结果突出了TF-loop的潜力，为使用自然语言处理技术解码染色质的3D结构提供了新的视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.