CacPred: a cascaded convolutional neural network for TF-DNA binding prediction.

IF 3.5 2区生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

BMC Genomics Pub Date : 2025-03-18 DOI:10.1186/s12864-025-11399-y

Shuangquan Zhang, Anjun Ma, Xuping Xie, Zhichao Lian, Yan Wang

{"title":"CacPred: a cascaded convolutional neural network for TF-DNA binding prediction.","authors":"Shuangquan Zhang, Anjun Ma, Xuping Xie, Zhichao Lian, Yan Wang","doi":"10.1186/s12864-025-11399-y","DOIUrl":null,"url":null,"abstract":"Background: Transcription factors (TFs) regulate the genes' expression by binding to DNA sequences. Aligned TFBSs of the same TF are seen as cis-regulatory motifs, and substantial computational efforts have been invested to find motifs. In recent years, convolutional neural networks (CNNs) have succeeded in TF-DNA binding prediction, but existing DL methods' accuracy needs to be improved and convolution function in TF-DNA binding prediction should be further explored.Results: We develop a cascaded convolutional neural network model named CacPred to predict TF-DNA binding on 790 Chromatin immunoprecipitation-sequencing (ChIP-seq) datasets and seven ChIP-nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode, and single ligation) datasets. We compare CacPred to six existing DL models across nine standard evaluation metrics. Our results indicate that CacPred outperforms all comparison models for TF-DNA binding prediction, and the average accuracy (ACC), matthews correlation coefficient (MCC), and the area of eight metrics radar (AEMR) are improved by 3.3%, 9.2%, and 6.4% on 790 ChIP-seq datasets. Meanwhile, CacPred improves the average ACC, MCC, and AEMR of 5.5%, 16.8%, and 12.9% on seven ChIP-nexus datasets. To explain the proposed method, motifs are used to show features CacPred learned. In light of the results, CacPred can find some significant motifs from input sequences.Conclusions: This paper indicates that CacPred performs better than existing models on ChIP-seq data. Seven ChIP-nexus datasets are also analyzed, and they coincide with results that our proposed method performs the best on ChIP-seq data. CacPred only is equipped with the convolutional algorithm, demonstrating that pooling processing of the existing models leads to losing some sequence information. Some significant motifs are found, showing that CacPred can learn features from input sequences. In this study, we demonstrate that CacPred is an effective and feasible model for predicting TF-DNA binding. CacPred is freely available at https://github.com/zhangsq06/CacPred .","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"26 Suppl 2","pages":"264"},"PeriodicalIF":3.5000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11916463/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-025-11399-y","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Transcription factors (TFs) regulate the genes' expression by binding to DNA sequences. Aligned TFBSs of the same TF are seen as cis-regulatory motifs, and substantial computational efforts have been invested to find motifs. In recent years, convolutional neural networks (CNNs) have succeeded in TF-DNA binding prediction, but existing DL methods' accuracy needs to be improved and convolution function in TF-DNA binding prediction should be further explored.

Results: We develop a cascaded convolutional neural network model named CacPred to predict TF-DNA binding on 790 Chromatin immunoprecipitation-sequencing (ChIP-seq) datasets and seven ChIP-nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode, and single ligation) datasets. We compare CacPred to six existing DL models across nine standard evaluation metrics. Our results indicate that CacPred outperforms all comparison models for TF-DNA binding prediction, and the average accuracy (ACC), matthews correlation coefficient (MCC), and the area of eight metrics radar (AEMR) are improved by 3.3%, 9.2%, and 6.4% on 790 ChIP-seq datasets. Meanwhile, CacPred improves the average ACC, MCC, and AEMR of 5.5%, 16.8%, and 12.9% on seven ChIP-nexus datasets. To explain the proposed method, motifs are used to show features CacPred learned. In light of the results, CacPred can find some significant motifs from input sequences.

Conclusions: This paper indicates that CacPred performs better than existing models on ChIP-seq data. Seven ChIP-nexus datasets are also analyzed, and they coincide with results that our proposed method performs the best on ChIP-seq data. CacPred only is equipped with the convolutional algorithm, demonstrating that pooling processing of the existing models leads to losing some sequence information. Some significant motifs are found, showing that CacPred can learn features from input sequences. In this study, we demonstrate that CacPred is an effective and feasible model for predicting TF-DNA binding. CacPred is freely available at https://github.com/zhangsq06/CacPred .

查看原文本刊更多论文

用于TF-DNA结合预测的级联卷积神经网络。

背景：转录因子（Transcription factors, TFs）通过结合DNA序列调控基因的表达。相同TF的对齐的TFBSs被视为顺式调控基序，并且已经投入了大量的计算工作来寻找基序。近年来，卷积神经网络（convolutional neural networks, cnn）在TF-DNA结合预测中取得了成功，但现有DL方法的准确性有待提高，卷积函数在TF-DNA结合预测中的应用有待进一步探索。结果：我们开发了一个名为CacPred的级联卷积神经网络模型，用于预测TF-DNA在790个染色质免疫沉淀测序（ChIP-seq）数据集和7个ChIP-nexus（通过核酸外切酶、唯一条形码和单连接进行核苷酸分辨率的染色质免疫沉淀实验）数据集上的结合。我们将CacPred与六个现有的深度学习模型在九个标准评估指标上进行比较。研究结果表明，CacPred在预测TF-DNA结合方面优于所有比较模型，在790个ChIP-seq数据集上，CacPred的平均准确率（ACC）、马修斯相关系数（MCC）和八指标雷达面积（AEMR）分别提高了3.3%、9.2%和6.4%。同时，CacPred在7个ChIP-nexus数据集上的平均ACC、MCC和AEMR分别提高了5.5%、16.8%和12.9%。为了解释所提出的方法，使用motif来显示CacPred学习到的特征。根据结果，CacPred可以从输入序列中找到一些重要的motif。结论：本文表明CacPred在ChIP-seq数据上的性能优于现有模型。对七个ChIP-nexus数据集进行了分析，结果表明本文方法对ChIP-seq数据的处理效果最好。CacPred只配备了卷积算法，说明对现有模型进行池化处理会导致丢失部分序列信息。结果表明，CacPred可以从输入序列中学习特征。在本研究中，我们证明了CacPred是预测TF-DNA结合的有效和可行的模型。CacPred可以在https://github.com/zhangsq06/CacPred上免费获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Genomics 生物-生物工程与应用微生物

CiteScore

7.40

自引率

4.50%

发文量

769

审稿时长

6.4 months

期刊介绍： BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.