Fast algorithms for window accumulated subsequence matching problem

IF 0.5 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Zdenek Tronicek
{"title":"Fast algorithms for window accumulated subsequence matching problem","authors":"Zdenek Tronicek","doi":"10.1007/s00236-026-00523-4","DOIUrl":null,"url":null,"abstract":"<div><p>A subsequence of a string <i>T</i> is any string that can be obtained by removing zero or more symbols from <i>T</i>. The paper deals with the Window Accumulated Subsequence matching Problem (WASP), which is defined as follows: Given two strings, the text <i>T</i> and the pattern <i>P</i>, and a positive integer <i>w</i>, the window size, find the number of size <i>w</i> substrings of <i>T</i> that contain <i>P</i> as a subsequence. Three algorithms for this problem are introduced: a bit-parallel approach, an algorithm preprocessing the pattern, and an algorithm preprocessing the text. The bit-parallel approach outperforms the state-of-the-art algorithm, and the other two algorithms outperform the bit-parallel approach for small alphabets, short patterns, and windows that are not much larger than the pattern. Furthermore, a preprocessing of the text that solves WASP for a fixed window size and each possible pattern of a given size is described. This is beneficial when we are to solve WASP for a single text and multiple patterns, because when the text is preprocessed, a solution is provided promptly.</p></div>","PeriodicalId":7189,"journal":{"name":"Acta Informatica","volume":"63 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s00236-026-00523-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

A subsequence of a string T is any string that can be obtained by removing zero or more symbols from T. The paper deals with the Window Accumulated Subsequence matching Problem (WASP), which is defined as follows: Given two strings, the text T and the pattern P, and a positive integer w, the window size, find the number of size w substrings of T that contain P as a subsequence. Three algorithms for this problem are introduced: a bit-parallel approach, an algorithm preprocessing the pattern, and an algorithm preprocessing the text. The bit-parallel approach outperforms the state-of-the-art algorithm, and the other two algorithms outperform the bit-parallel approach for small alphabets, short patterns, and windows that are not much larger than the pattern. Furthermore, a preprocessing of the text that solves WASP for a fixed window size and each possible pattern of a given size is described. This is beneficial when we are to solve WASP for a single text and multiple patterns, because when the text is preprocessed, a solution is provided promptly.

Abstract Image

窗口累积子序列匹配问题的快速算法
字符串T的子序列是指从T中去掉0个或多个符号即可得到的任意字符串。本文研究窗口累积子序列匹配问题(Window accumulate子序列matching Problem, WASP),定义如下:给定两个字符串,文本T和模式P,以及窗口大小为正整数w,求T中包含P作为子序列的大小为w的子字符串的个数。介绍了该问题的三种算法:位并行算法、模式预处理算法和文本预处理算法。位并行方法优于最先进的算法,另外两种算法在小字母、短模式和比模式大不了多少的窗口方面优于位并行方法。此外,还描述了解决固定窗口大小和给定大小的每种可能模式的WASP的文本预处理。当我们要解决单个文本和多个模式的WASP时,这是有益的,因为当文本被预处理时,解决方案就会立即提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Acta Informatica
Acta Informatica 工程技术-计算机:信息系统
CiteScore
2.40
自引率
16.70%
发文量
24
审稿时长
>12 weeks
期刊介绍: Acta Informatica provides international dissemination of articles on formal methods for the design and analysis of programs, computing systems and information structures, as well as related fields of Theoretical Computer Science such as Automata Theory, Logic in Computer Science, and Algorithmics. Topics of interest include: • semantics of programming languages • models and modeling languages for concurrent, distributed, reactive and mobile systems • models and modeling languages for timed, hybrid and probabilistic systems • specification, program analysis and verification • model checking and theorem proving • modal, temporal, first- and higher-order logics, and their variants • constraint logic, SAT/SMT-solving techniques • theoretical aspects of databases, semi-structured data and finite model theory • theoretical aspects of artificial intelligence, knowledge representation, description logic • automata theory, formal languages, term and graph rewriting • game-based models, synthesis • type theory, typed calculi • algebraic, coalgebraic and categorical methods • formal aspects of performance, dependability and reliability analysis • foundations of information and network security • parallel, distributed and randomized algorithms • design and analysis of algorithms • foundations of network and communication protocols.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书