位置权重矩阵或循环概率有限自动机:使用哪种模型?用于预测转录因子结合位点的决策规则推论。

IF 1.7 4区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Genetics and Molecular Biology Pub Date : 2024-01-19 eCollection Date: 2024-01-01 DOI:10.1590/1678-4685-GMB-2023-0048
Guilherme Miura Lavezzo, Marcelo de Souza Lauretto, Luiz Paulo Moura Andrioli, Ariane Machado-Lima
{"title":"位置权重矩阵或循环概率有限自动机:使用哪种模型?用于预测转录因子结合位点的决策规则推论。","authors":"Guilherme Miura Lavezzo, Marcelo de Souza Lauretto, Luiz Paulo Moura Andrioli, Ariane Machado-Lima","doi":"10.1590/1678-4685-GMB-2023-0048","DOIUrl":null,"url":null,"abstract":"<p><p>Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.</p>","PeriodicalId":12557,"journal":{"name":"Genetics and Molecular Biology","volume":"46 4","pages":"e20230048"},"PeriodicalIF":1.7000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10945726/pdf/","citationCount":"0","resultStr":"{\"title\":\"Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.\",\"authors\":\"Guilherme Miura Lavezzo, Marcelo de Souza Lauretto, Luiz Paulo Moura Andrioli, Ariane Machado-Lima\",\"doi\":\"10.1590/1678-4685-GMB-2023-0048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.</p>\",\"PeriodicalId\":12557,\"journal\":{\"name\":\"Genetics and Molecular Biology\",\"volume\":\"46 4\",\"pages\":\"e20230048\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10945726/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetics and Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1590/1678-4685-GMB-2023-0048\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics and Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1590/1678-4685-GMB-2023-0048","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

预测转录因子结合位点(TFBS)是生物信息学的一个应用实例,其中 DNA 分子表示为 A、C、G 和 T 符号序列。该问题中最常用的模型是位置权重矩阵(PWM)。尽管位置权重矩阵具有简单的优点,但它无法捕捉核苷酸位置之间的依赖性,这可能会影响预测性能。无环概率有限自动机(APFA)是一种能够适应位置依赖性的替代模型。然而,APFA 是一个更复杂的模型,这意味着需要学习更多的参数。在本文中,我们提出了一种创新方法,用于识别位置依赖性何时会影响对 PWM 或 APFA 的偏好。这意味着利用从 1106 组 TFBS 中提取的位置依赖特征来推断决策树,从而预测给定 TFBS 的最佳模型是 PWM 还是 APFA。根据我们的结果,只有三个精确定位的特征就能选择最佳模型,从而在性能(平均精度)和模型简易性之间取得平衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.

Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genetics and Molecular Biology
Genetics and Molecular Biology 生物-生化与分子生物学
CiteScore
4.20
自引率
4.80%
发文量
111
审稿时长
3 months
期刊介绍: Genetics and Molecular Biology (formerly named Revista Brasileira de Genética/Brazilian Journal of Genetics - ISSN 0100-8455) is published by the Sociedade Brasileira de Genética (Brazilian Society of Genetics). The Journal considers contributions that present the results of original research in genetics, evolution and related scientific disciplines. Manuscripts presenting methods and applications only, without an analysis of genetic data, will not be considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信