TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species.

IF 3.6 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Transcription-Austin Pub Date : 2025-04-01 Epub Date: 2025-07-11 DOI:10.1080/21541264.2025.2521764
Harlan R Barker, Seppo Parkkila, MarttiE E Tolvanen
{"title":"TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species.","authors":"Harlan R Barker, Seppo Parkkila, MarttiE E Tolvanen","doi":"10.1080/21541264.2025.2521764","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Transcription factor (TF) proteins play a critical role in the regulation of eukaryotic gene expression via sequence-specific binding to genomic locations known as transcription factor binding sites (TFBSs). Accurate prediction of TFBSs is essential for understanding gene regulation, disease mechanisms, and drug discovery. These studies are therefore relevant not only in humans but also in model organisms and domesticated and wild animals. However, current tools for the automatic analysis of TFBSs in gene promoter regions are limited in their usability across multiple species. To our knowledge, no tools currently exist that allow for automatic analysis of TFBSs in gene promoter regions for many species.</p><p><strong>Methodology and findings: </strong>The TFBSFootprinter tool combines multiomic transcription-relevant data for more accurate prediction of functional TFBSs in 317 vertebrate species. In humans, this includes vertebrate sequence conservation (GERP), proximity to transcription start sites (FANTOM5), correlation of expression between target genes and TFs predicted to bind promoters (FANTOM5), overlap with ChIP-Seq TF metaclusters (GTRD), overlap with ATAC-Seq peaks (ENCODE), eQTLs (GTEx), and the observed/expected CpG ratio (Ensembl). In non-human vertebrates, this includes GERP, proximity to transcription start sites, and CpG ratio.TFBSFootprinter analyses are based on the Ensembl transcript ID for simplicity of use and require minimal setup steps. Benchmarking of the TFBSFootprinter on a manually curated and experimentally verified dataset of TFBSs produced superior results when using all multiomic data (average area under the receiver operating characteristic curve, 0.881), compared with DeepBind (0.798), DeepSEA (0.682), FIMO (0.817) and traditional PWM (0.854). The results were further improved by selecting the best overall combination of multiomic data (0.910). Additionally, we determined combinations of multiomic data that provide the best model of binding for each TF. TFBSFootprinter is available as Conda and Python packages.</p>","PeriodicalId":47009,"journal":{"name":"Transcription-Austin","volume":"16 2-3","pages":"204-223"},"PeriodicalIF":3.6000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12258250/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transcription-Austin","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/21541264.2025.2521764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Transcription factor (TF) proteins play a critical role in the regulation of eukaryotic gene expression via sequence-specific binding to genomic locations known as transcription factor binding sites (TFBSs). Accurate prediction of TFBSs is essential for understanding gene regulation, disease mechanisms, and drug discovery. These studies are therefore relevant not only in humans but also in model organisms and domesticated and wild animals. However, current tools for the automatic analysis of TFBSs in gene promoter regions are limited in their usability across multiple species. To our knowledge, no tools currently exist that allow for automatic analysis of TFBSs in gene promoter regions for many species.

Methodology and findings: The TFBSFootprinter tool combines multiomic transcription-relevant data for more accurate prediction of functional TFBSs in 317 vertebrate species. In humans, this includes vertebrate sequence conservation (GERP), proximity to transcription start sites (FANTOM5), correlation of expression between target genes and TFs predicted to bind promoters (FANTOM5), overlap with ChIP-Seq TF metaclusters (GTRD), overlap with ATAC-Seq peaks (ENCODE), eQTLs (GTEx), and the observed/expected CpG ratio (Ensembl). In non-human vertebrates, this includes GERP, proximity to transcription start sites, and CpG ratio.TFBSFootprinter analyses are based on the Ensembl transcript ID for simplicity of use and require minimal setup steps. Benchmarking of the TFBSFootprinter on a manually curated and experimentally verified dataset of TFBSs produced superior results when using all multiomic data (average area under the receiver operating characteristic curve, 0.881), compared with DeepBind (0.798), DeepSEA (0.682), FIMO (0.817) and traditional PWM (0.854). The results were further improved by selecting the best overall combination of multiomic data (0.910). Additionally, we determined combinations of multiomic data that provide the best model of binding for each TF. TFBSFootprinter is available as Conda and Python packages.

TFBSFootprinter:用于预测脊椎动物物种转录因子结合位点的多组学工具。
背景:转录因子(TF)蛋白通过序列特异性结合到基因组位置,即转录因子结合位点(TFBSs),在真核生物基因表达调控中起着关键作用。准确预测TFBSs对于理解基因调控、疾病机制和药物发现至关重要。因此,这些研究不仅适用于人类,也适用于模式生物、驯养动物和野生动物。然而,目前用于基因启动子区域TFBSs自动分析的工具在多物种间的可用性受到限制。据我们所知,目前还没有工具可以自动分析许多物种基因启动子区域的TFBSs。方法和发现:TFBSFootprinter工具结合了多组转录相关数据,可以更准确地预测317种脊椎动物的功能性TFBSs。在人类中,这包括脊椎动物序列保守性(GERP)、转录起始位点的邻近性(FANTOM5)、靶基因与预计结合启动子的TF之间的表达相关性(FANTOM5)、与ChIP-Seq TF元簇(GTRD)的重叠、与ATAC-Seq峰(ENCODE)、eQTLs (GTEx)的重叠以及观察/预期CpG比率(Ensembl)。在非人类脊椎动物中,这包括GERP、转录起始位点的接近程度和CpG比率。TFBSFootprinter分析基于使用简单的Ensembl转录ID,并且需要最少的设置步骤。TFBSFootprinter在人工编制和实验验证的TFBSs数据集上进行基准测试,与DeepBind(0.798)、DeepSEA(0.682)、FIMO(0.817)和传统PWM(0.854)相比,使用所有多组数据(接收器工作特性曲线下的平均面积,0.881)产生了更好的结果。通过选择多组数据的最佳综合组合(0.910),进一步提高了结果。此外,我们确定了多组数据的组合,为每个TF提供了最佳的结合模型。TFBSFootprinter作为Conda和Python包提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Transcription-Austin
Transcription-Austin BIOCHEMISTRY & MOLECULAR BIOLOGY-
CiteScore
6.50
自引率
5.60%
发文量
9
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信