Rezwan Hosseini, Ali Tugrul Balci, Dennis Kostka, Nathan Clark, Maria Chikina
{"title":"利用生物物理转录因子结合模型的超快速变异效应预测","authors":"Rezwan Hosseini, Ali Tugrul Balci, Dennis Kostka, Nathan Clark, Maria Chikina","doi":"10.1093/nar/gkaf940","DOIUrl":null,"url":null,"abstract":"Sequence variation within transcription factor (TF)-binding sites can significantly affect TF–DNA interactions, influencing gene expression and contributing to disease susceptibility or phenotypic traits. Despite recent progress in deep sequence-to-function models that predict functional output from sequence data, these methods perform inadequately on some variant effect prediction tasks, especially with common genetic variants. This limitation underscores the importance of leveraging biophysical models of TF binding to enhance interpretability of variant effect scores and facilitate mechanistic insights. We introduce motifDiff, a novel computational tool designed to quantify variant effects using mono- and dinucleotide position weight matrices. motifDiff offers several key advantages, including scalability to score millions of variants within minutes, implementation of statistically rigorous normalization strategy critical for optimal performance, and support for both dinucleotide and mononucleotide models. We demonstrate motifDiff’s efficacy by evaluating it across diverse ground truth datasets that quantify the effects of common variants in vivo, thereby establishing robust benchmarks for the predictive value of variant effect calculations. Finally, we show that our tool provides unique insights when interpreting human accelerated regions. motifDiff is available as a standalone Python application at https://github.com/rezwanhosseini/MotifDiff.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"82 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ultra-fast variant effect prediction using biophysical transcription factor binding models\",\"authors\":\"Rezwan Hosseini, Ali Tugrul Balci, Dennis Kostka, Nathan Clark, Maria Chikina\",\"doi\":\"10.1093/nar/gkaf940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequence variation within transcription factor (TF)-binding sites can significantly affect TF–DNA interactions, influencing gene expression and contributing to disease susceptibility or phenotypic traits. Despite recent progress in deep sequence-to-function models that predict functional output from sequence data, these methods perform inadequately on some variant effect prediction tasks, especially with common genetic variants. This limitation underscores the importance of leveraging biophysical models of TF binding to enhance interpretability of variant effect scores and facilitate mechanistic insights. We introduce motifDiff, a novel computational tool designed to quantify variant effects using mono- and dinucleotide position weight matrices. motifDiff offers several key advantages, including scalability to score millions of variants within minutes, implementation of statistically rigorous normalization strategy critical for optimal performance, and support for both dinucleotide and mononucleotide models. We demonstrate motifDiff’s efficacy by evaluating it across diverse ground truth datasets that quantify the effects of common variants in vivo, thereby establishing robust benchmarks for the predictive value of variant effect calculations. Finally, we show that our tool provides unique insights when interpreting human accelerated regions. motifDiff is available as a standalone Python application at https://github.com/rezwanhosseini/MotifDiff.\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":\"82 1\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkaf940\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf940","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Ultra-fast variant effect prediction using biophysical transcription factor binding models
Sequence variation within transcription factor (TF)-binding sites can significantly affect TF–DNA interactions, influencing gene expression and contributing to disease susceptibility or phenotypic traits. Despite recent progress in deep sequence-to-function models that predict functional output from sequence data, these methods perform inadequately on some variant effect prediction tasks, especially with common genetic variants. This limitation underscores the importance of leveraging biophysical models of TF binding to enhance interpretability of variant effect scores and facilitate mechanistic insights. We introduce motifDiff, a novel computational tool designed to quantify variant effects using mono- and dinucleotide position weight matrices. motifDiff offers several key advantages, including scalability to score millions of variants within minutes, implementation of statistically rigorous normalization strategy critical for optimal performance, and support for both dinucleotide and mononucleotide models. We demonstrate motifDiff’s efficacy by evaluating it across diverse ground truth datasets that quantify the effects of common variants in vivo, thereby establishing robust benchmarks for the predictive value of variant effect calculations. Finally, we show that our tool provides unique insights when interpreting human accelerated regions. motifDiff is available as a standalone Python application at https://github.com/rezwanhosseini/MotifDiff.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.