Trie-PMS8:基于三叉树的种植图案搜索问题稳健解决方案

Mohammad Hasan , Abu Saleh Musa Miah , Md. Humaun Kabir , Mahmudul Alam
{"title":"Trie-PMS8:基于三叉树的种植图案搜索问题稳健解决方案","authors":"Mohammad Hasan ,&nbsp;Abu Saleh Musa Miah ,&nbsp;Md. Humaun Kabir ,&nbsp;Mahmudul Alam","doi":"10.1016/j.ijcce.2024.07.004","DOIUrl":null,"url":null,"abstract":"<div><p>Finding patterns in biological sequences is a crucial and intriguing task. This paper explores the (Ɩ, d) motif search problem, also known as Planted Motif Search (PMS), and discusses its challenging nature as an NP-hard problem. PMS and (Ɩ, d) motif search algorithms are believed to represent the next generation of tools for motif discovery. In this context, PMS deals with n biological sequences and two parameters, Ɩ and d, to identify sequences of Ɩ length that occur in all input strings with, at most, d mismatches. Many existing exact PMS algorithms exhibit exponential time complexity in worst-case scenarios. This paper introduces an innovative algorithm that focuses on improving the efficiency of the sample-driven portion of the process. Specifically, dynamic programming techniques are employed to avoid redundant calculations in frequently used subtrees. Furthermore, this paper presents novel approaches to enhance algorithm performance, such as utilizing a trie tree that significantly reduces the time for the “sort rows by size” step. It has also reduced the spaces that take linked lists on LL-PMS8 (<span><span>Hasan et al., Jun., 2022</span></span>) or reduced the number of l-mers. Using trie tree as the main way to speed things up gives a much better result than older versions of PMS methods like LL-PMS8 (<span><span>Hasan et al., Jun., 2022</span></span>). Overall time complexity reduced than the previous method is 26.17 % and 16.48 % for real-world and generated datasets (<span><span>Hasan et al., 2020</span></span>).</p></div>","PeriodicalId":100694,"journal":{"name":"International Journal of Cognitive Computing in Engineering","volume":"5 ","pages":"Pages 332-342"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666307424000251/pdfft?md5=846137dd18a119f7d1c056597efd317f&pid=1-s2.0-S2666307424000251-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Trie-PMS8: A trie-tree based robust solution for planted motif search problem\",\"authors\":\"Mohammad Hasan ,&nbsp;Abu Saleh Musa Miah ,&nbsp;Md. Humaun Kabir ,&nbsp;Mahmudul Alam\",\"doi\":\"10.1016/j.ijcce.2024.07.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Finding patterns in biological sequences is a crucial and intriguing task. This paper explores the (Ɩ, d) motif search problem, also known as Planted Motif Search (PMS), and discusses its challenging nature as an NP-hard problem. PMS and (Ɩ, d) motif search algorithms are believed to represent the next generation of tools for motif discovery. In this context, PMS deals with n biological sequences and two parameters, Ɩ and d, to identify sequences of Ɩ length that occur in all input strings with, at most, d mismatches. Many existing exact PMS algorithms exhibit exponential time complexity in worst-case scenarios. This paper introduces an innovative algorithm that focuses on improving the efficiency of the sample-driven portion of the process. Specifically, dynamic programming techniques are employed to avoid redundant calculations in frequently used subtrees. Furthermore, this paper presents novel approaches to enhance algorithm performance, such as utilizing a trie tree that significantly reduces the time for the “sort rows by size” step. It has also reduced the spaces that take linked lists on LL-PMS8 (<span><span>Hasan et al., Jun., 2022</span></span>) or reduced the number of l-mers. Using trie tree as the main way to speed things up gives a much better result than older versions of PMS methods like LL-PMS8 (<span><span>Hasan et al., Jun., 2022</span></span>). Overall time complexity reduced than the previous method is 26.17 % and 16.48 % for real-world and generated datasets (<span><span>Hasan et al., 2020</span></span>).</p></div>\",\"PeriodicalId\":100694,\"journal\":{\"name\":\"International Journal of Cognitive Computing in Engineering\",\"volume\":\"5 \",\"pages\":\"Pages 332-342\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666307424000251/pdfft?md5=846137dd18a119f7d1c056597efd317f&pid=1-s2.0-S2666307424000251-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Cognitive Computing in Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666307424000251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Cognitive Computing in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666307424000251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在生物序列中寻找模式是一项重要而又有趣的任务。本文探讨了(Ɩ, d)图案搜索问题(也称为种植图案搜索(PMS)),并讨论了它作为一个NP-hard问题所具有的挑战性。PMS 和 (Ɩ, d) 主题词搜索算法被认为是下一代主题词发现工具。在此背景下,PMS 算法处理 n 个生物序列和两个参数 Ɩ 和 d,以找出在所有输入字符串中出现的长度为 Ɩ 的序列,且最多有 d 个错配。现有的许多精确 PMS 算法在最坏情况下都会表现出指数级的时间复杂性。本文介绍了一种创新算法,重点在于提高样本驱动部分的效率。具体来说,本文采用了动态编程技术,以避免对常用子树进行冗余计算。此外,本文还提出了提高算法性能的新方法,例如利用三叉树大大减少了 "按大小排序行 "步骤的时间。它还减少了在 LL-PMS8 (Hasan 等人,2022 年 6 月)上使用链接列表的空间,或减少了 l-mers 的数量。使用三叉树作为加速的主要方法,比 LL-PMS8 等旧版 PMS 方法(Hasan et al.)在实际数据集和生成数据集上,总体时间复杂度分别比之前的方法降低了 26.17 % 和 16.48 %(Hasan 等人,2020 年)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Trie-PMS8: A trie-tree based robust solution for planted motif search problem

Finding patterns in biological sequences is a crucial and intriguing task. This paper explores the (Ɩ, d) motif search problem, also known as Planted Motif Search (PMS), and discusses its challenging nature as an NP-hard problem. PMS and (Ɩ, d) motif search algorithms are believed to represent the next generation of tools for motif discovery. In this context, PMS deals with n biological sequences and two parameters, Ɩ and d, to identify sequences of Ɩ length that occur in all input strings with, at most, d mismatches. Many existing exact PMS algorithms exhibit exponential time complexity in worst-case scenarios. This paper introduces an innovative algorithm that focuses on improving the efficiency of the sample-driven portion of the process. Specifically, dynamic programming techniques are employed to avoid redundant calculations in frequently used subtrees. Furthermore, this paper presents novel approaches to enhance algorithm performance, such as utilizing a trie tree that significantly reduces the time for the “sort rows by size” step. It has also reduced the spaces that take linked lists on LL-PMS8 (Hasan et al., Jun., 2022) or reduced the number of l-mers. Using trie tree as the main way to speed things up gives a much better result than older versions of PMS methods like LL-PMS8 (Hasan et al., Jun., 2022). Overall time complexity reduced than the previous method is 26.17 % and 16.48 % for real-world and generated datasets (Hasan et al., 2020).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
13.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信