{"title":"A novel deep learning framework with dynamic tokenization for identifying chromatin interactions along with motif importance investigation.","authors":"Liangcan Li, Xin Li, Hao Wu","doi":"10.1093/bib/bbaf289","DOIUrl":null,"url":null,"abstract":"<p><p>A comprehensive understanding of chromatin interaction networks is crucial for unraveling the regulatory mechanisms of gene expression. While various computational methods have been developed to predict chromatin interactions and address the limitations and high costs of high-throughput experimental techniques, their performance is often overestimated due to the specificity of chromatin interaction data. In this study, we proposed Inter-Chrom, a novel deep learning model integrating dynamic tokenization, DNABERT's word embedding, and the efficient channel attention mechanism to identify chromatin interactions using sequence and genomic features, leveraging a newly curated dataset. Experimental results demonstrate that Inter-Chrom outperforms existing methods on three cell line datasets. Additionally, we proposed a novel method for calculating motif importance and analyzed the motifs with high importance scores identified through this method, including those that have been extensively studied and others that have received limited attention to date. Inter-Chrom's robustness for input variations and superior ability to leverage sequence features position it as a powerful tool for advancing chromatin interaction research. The source code of Inter-Chrom is freely available at https://github.com/HaoWuLab-Bioinformatics/Inter-Chrom.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204613/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf289","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
A comprehensive understanding of chromatin interaction networks is crucial for unraveling the regulatory mechanisms of gene expression. While various computational methods have been developed to predict chromatin interactions and address the limitations and high costs of high-throughput experimental techniques, their performance is often overestimated due to the specificity of chromatin interaction data. In this study, we proposed Inter-Chrom, a novel deep learning model integrating dynamic tokenization, DNABERT's word embedding, and the efficient channel attention mechanism to identify chromatin interactions using sequence and genomic features, leveraging a newly curated dataset. Experimental results demonstrate that Inter-Chrom outperforms existing methods on three cell line datasets. Additionally, we proposed a novel method for calculating motif importance and analyzed the motifs with high importance scores identified through this method, including those that have been extensively studied and others that have received limited attention to date. Inter-Chrom's robustness for input variations and superior ability to leverage sequence features position it as a powerful tool for advancing chromatin interaction research. The source code of Inter-Chrom is freely available at https://github.com/HaoWuLab-Bioinformatics/Inter-Chrom.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.