利用生物物理转录因子结合模型的超快速变异效应预测

IF 13.1 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Rezwan Hosseini, Ali Tugrul Balci, Dennis Kostka, Nathan Clark, Maria Chikina
{"title":"利用生物物理转录因子结合模型的超快速变异效应预测","authors":"Rezwan Hosseini, Ali Tugrul Balci, Dennis Kostka, Nathan Clark, Maria Chikina","doi":"10.1093/nar/gkaf940","DOIUrl":null,"url":null,"abstract":"Sequence variation within transcription factor (TF)-binding sites can significantly affect TF–DNA interactions, influencing gene expression and contributing to disease susceptibility or phenotypic traits. Despite recent progress in deep sequence-to-function models that predict functional output from sequence data, these methods perform inadequately on some variant effect prediction tasks, especially with common genetic variants. This limitation underscores the importance of leveraging biophysical models of TF binding to enhance interpretability of variant effect scores and facilitate mechanistic insights. We introduce motifDiff, a novel computational tool designed to quantify variant effects using mono- and dinucleotide position weight matrices. motifDiff offers several key advantages, including scalability to score millions of variants within minutes, implementation of statistically rigorous normalization strategy critical for optimal performance, and support for both dinucleotide and mononucleotide models. We demonstrate motifDiff’s efficacy by evaluating it across diverse ground truth datasets that quantify the effects of common variants in vivo, thereby establishing robust benchmarks for the predictive value of variant effect calculations. Finally, we show that our tool provides unique insights when interpreting human accelerated regions. motifDiff is available as a standalone Python application at https://github.com/rezwanhosseini/MotifDiff.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"82 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ultra-fast variant effect prediction using biophysical transcription factor binding models\",\"authors\":\"Rezwan Hosseini, Ali Tugrul Balci, Dennis Kostka, Nathan Clark, Maria Chikina\",\"doi\":\"10.1093/nar/gkaf940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequence variation within transcription factor (TF)-binding sites can significantly affect TF–DNA interactions, influencing gene expression and contributing to disease susceptibility or phenotypic traits. Despite recent progress in deep sequence-to-function models that predict functional output from sequence data, these methods perform inadequately on some variant effect prediction tasks, especially with common genetic variants. This limitation underscores the importance of leveraging biophysical models of TF binding to enhance interpretability of variant effect scores and facilitate mechanistic insights. We introduce motifDiff, a novel computational tool designed to quantify variant effects using mono- and dinucleotide position weight matrices. motifDiff offers several key advantages, including scalability to score millions of variants within minutes, implementation of statistically rigorous normalization strategy critical for optimal performance, and support for both dinucleotide and mononucleotide models. We demonstrate motifDiff’s efficacy by evaluating it across diverse ground truth datasets that quantify the effects of common variants in vivo, thereby establishing robust benchmarks for the predictive value of variant effect calculations. Finally, we show that our tool provides unique insights when interpreting human accelerated regions. motifDiff is available as a standalone Python application at https://github.com/rezwanhosseini/MotifDiff.\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":\"82 1\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkaf940\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf940","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

转录因子(TF)结合位点内的序列变异可显著影响TF - dna的相互作用,影响基因表达,影响疾病易感性或表型性状。尽管最近在深度序列到功能模型中取得了进展,该模型可以预测序列数据的功能输出,但这些方法在一些变异效应预测任务中表现不佳,特别是对于常见的遗传变异。这一限制强调了利用TF结合的生物物理模型来增强变异效应评分的可解释性和促进机制见解的重要性。我们介绍了motifDiff,一个新的计算工具,旨在量化使用单核苷酸和二核苷酸位置权重矩阵的变异效应。motifDiff提供了几个关键优势,包括在几分钟内对数百万个变量进行评分的可伸缩性,实现对最佳性能至关重要的统计严格规范化策略,以及对双核苷酸和单核苷酸模型的支持。我们通过评估不同的实地真实数据集来证明motifDiff的功效,这些数据集量化了体内常见变异的影响,从而为变异效应计算的预测值建立了强大的基准。最后,我们展示了我们的工具在解释人类加速区域时提供了独特的见解。motifDiff作为一个独立的Python应用程序可以在https://github.com/rezwanhosseini/MotifDiff上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Ultra-fast variant effect prediction using biophysical transcription factor binding models
Sequence variation within transcription factor (TF)-binding sites can significantly affect TF–DNA interactions, influencing gene expression and contributing to disease susceptibility or phenotypic traits. Despite recent progress in deep sequence-to-function models that predict functional output from sequence data, these methods perform inadequately on some variant effect prediction tasks, especially with common genetic variants. This limitation underscores the importance of leveraging biophysical models of TF binding to enhance interpretability of variant effect scores and facilitate mechanistic insights. We introduce motifDiff, a novel computational tool designed to quantify variant effects using mono- and dinucleotide position weight matrices. motifDiff offers several key advantages, including scalability to score millions of variants within minutes, implementation of statistically rigorous normalization strategy critical for optimal performance, and support for both dinucleotide and mononucleotide models. We demonstrate motifDiff’s efficacy by evaluating it across diverse ground truth datasets that quantify the effects of common variants in vivo, thereby establishing robust benchmarks for the predictive value of variant effect calculations. Finally, we show that our tool provides unique insights when interpreting human accelerated regions. motifDiff is available as a standalone Python application at https://github.com/rezwanhosseini/MotifDiff.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信