位置权重矩阵或循环概率有限自动机：使用哪种模型？用于预测转录因子结合位点的决策规则推论。

IF 1.3 4区生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genetics and Molecular Biology Pub Date : 2024-01-19 eCollection Date: 2024-01-01 DOI:10.1590/1678-4685-GMB-2023-0048

Guilherme Miura Lavezzo, Marcelo de Souza Lauretto, Luiz Paulo Moura Andrioli, Ariane Machado-Lima

{"title":"位置权重矩阵或循环概率有限自动机：使用哪种模型？用于预测转录因子结合位点的决策规则推论。","authors":"Guilherme Miura Lavezzo, Marcelo de Souza Lauretto, Luiz Paulo Moura Andrioli, Ariane Machado-Lima","doi":"10.1590/1678-4685-GMB-2023-0048","DOIUrl":null,"url":null,"abstract":"Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.","PeriodicalId":12557,"journal":{"name":"Genetics and Molecular Biology","volume":"46 4","pages":"e20230048"},"PeriodicalIF":1.3000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10945726/pdf/","citationCount":"0","resultStr":"{\"title\":\"Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.\",\"authors\":\"Guilherme Miura Lavezzo, Marcelo de Souza Lauretto, Luiz Paulo Moura Andrioli, Ariane Machado-Lima\",\"doi\":\"10.1590/1678-4685-GMB-2023-0048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.\",\"PeriodicalId\":12557,\"journal\":{\"name\":\"Genetics and Molecular Biology\",\"volume\":\"46 4\",\"pages\":\"e20230048\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10945726/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetics and Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1590/1678-4685-GMB-2023-0048\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics and Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1590/1678-4685-GMB-2023-0048","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

预测转录因子结合位点（TFBS）是生物信息学的一个应用实例，其中 DNA 分子表示为 A、C、G 和 T 符号序列。该问题中最常用的模型是位置权重矩阵（PWM）。尽管位置权重矩阵具有简单的优点，但它无法捕捉核苷酸位置之间的依赖性，这可能会影响预测性能。无环概率有限自动机（APFA）是一种能够适应位置依赖性的替代模型。然而，APFA 是一个更复杂的模型，这意味着需要学习更多的参数。在本文中，我们提出了一种创新方法，用于识别位置依赖性何时会影响对 PWM 或 APFA 的偏好。这意味着利用从 1106 组 TFBS 中提取的位置依赖特征来推断决策树，从而预测给定 TFBS 的最佳模型是 PWM 还是 APFA。根据我们的结果，只有三个精确定位的特征就能选择最佳模型，从而在性能（平均精度）和模型简易性之间取得平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.

Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genetics and Molecular Biology 生物-生化与分子生物学

CiteScore

4.20

自引率

4.80%

发文量

111

审稿时长

3 months

期刊介绍： Genetics and Molecular Biology (formerly named Revista Brasileira de Genética/Brazilian Journal of Genetics - ISSN 0100-8455) is published by the Sociedade Brasileira de Genética (Brazilian Society of Genetics). The Journal considers contributions that present the results of original research in genetics, evolution and related scientific disciplines. Manuscripts presenting methods and applications only, without an analysis of genetic data, will not be considered.