String editing under pattern constraints

IF 0.9 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS
Robert D. Barish, Tetsuo Shibuya
{"title":"String editing under pattern constraints","authors":"Robert D. Barish,&nbsp;Tetsuo Shibuya","doi":"10.1016/j.tcs.2024.114889","DOIUrl":null,"url":null,"abstract":"<div><div>We introduce the novel Nearest Pattern Constrained String (NPCS) problem of finding a minimum set <span><math><mi>Q</mi></math></span> of character mutation, insertion, and deletion edit operations sufficient to modify a string <em>x</em> to contain all contiguous substrings in a pattern set <span><math><mi>P</mi></math></span> and no contiguous substrings in a forbidden pattern set <span><math><mi>F</mi></math></span>. Letting Σ be the alphabet of allowed characters, and letting <em>η</em> and ϒ be the longest string length and sum of all string lengths in <span><math><mi>P</mi><mo>∪</mo><mi>F</mi></math></span>, respectively, we show that NPCS is fixed-parameter tractable in <span><math><mo>|</mo><mi>P</mi><mo>|</mo></math></span> with time complexity <span><math><mi>O</mi><mrow><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mo>|</mo><mi>P</mi><mo>|</mo></mrow></msup><mo>⋅</mo><mi>ϒ</mi><mo>⋅</mo><mo>|</mo><mi>Σ</mi><mo>|</mo><mo>⋅</mo><mrow><mo>(</mo><mo>|</mo><mi>P</mi><mo>|</mo><mo>+</mo><mi>η</mi><mo>)</mo></mrow><mrow><mo>(</mo><mo>|</mo><mi>x</mi><mo>|</mo><mo>+</mo><mn>1</mn><mo>)</mo></mrow><mo>)</mo></mrow></math></span>. Additionally, we consider a generalization of the NPCS problem in which we allow for constraints based on the membership of substrings in regular languages. In particular, we introduce a problem we denote String Editing under Substring in Language Constraints (StrEdit-SILC), where provided a wildcard-free string <span><math><mi>x</mi><mo>∈</mo><msup><mrow><mi>Σ</mi></mrow><mrow><mo>⁎</mo></mrow></msup></math></span>, a finite set of regular languages <span><math><mi>R</mi><mo>=</mo><mo>{</mo><msub><mrow><mi>L</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>,</mo><mo>…</mo><mo>}</mo></math></span>, and a regular language <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi></mrow></msub></math></span>, the objective is to find a minimum cost set of mutation, insertion, and deletion edit operations <span><math><mi>Q</mi></math></span> that suffice to convert the input string <em>x</em> into a string <span><math><msup><mrow><mi>x</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>∈</mo><msup><mrow><mi>Σ</mi></mrow><mrow><mo>⁎</mo></mrow></msup></math></span>, where no substring has membership in <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi></mrow></msub></math></span>, and <span><math><mo>∀</mo><msub><mrow><mi>L</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∈</mo><mi>R</mi></math></span>, there exists a substring in <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span>. Here, letting Ψ and <em>ϖ</em> be the sum of all regular expression lengths and longest regular expression length for languages in <span><math><mi>R</mi><mo>∪</mo><mo>{</mo><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi></mrow></msub><mo>}</mo></math></span>, respectively, and letting <span><math><msub><mrow><mi>C</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>d</mi></mrow></msub><mo>∈</mo><mi>N</mi></math></span> be the maximum cost of an edit operation, we show that StrEdit-SILC is fixed-parameter tractable with respect to Ψ, having time complexity <span><math><mi>O</mi><mrow><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mi>Ψ</mi></mrow></msup><mo>⋅</mo><mo>|</mo><mi>x</mi><mo>|</mo><mo>⋅</mo><mrow><mo>(</mo><mi>ϖ</mi><mo>⋅</mo><mo>|</mo><mi>Σ</mi><mo>|</mo><mo>+</mo><msub><mrow><mi>C</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>d</mi></mrow></msub><mo>)</mo></mrow><mo>)</mo></mrow></math></span>. However, we also show that StrEdit-SILC is MAX-SNP-hard and otherwise difficult to approximate under stringent constraints.</div></div>","PeriodicalId":49438,"journal":{"name":"Theoretical Computer Science","volume":"1022 ","pages":"Article 114889"},"PeriodicalIF":0.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Computer Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304397524005061","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

We introduce the novel Nearest Pattern Constrained String (NPCS) problem of finding a minimum set Q of character mutation, insertion, and deletion edit operations sufficient to modify a string x to contain all contiguous substrings in a pattern set P and no contiguous substrings in a forbidden pattern set F. Letting Σ be the alphabet of allowed characters, and letting η and ϒ be the longest string length and sum of all string lengths in PF, respectively, we show that NPCS is fixed-parameter tractable in |P| with time complexity O(2|P|ϒ|Σ|(|P|+η)(|x|+1)). Additionally, we consider a generalization of the NPCS problem in which we allow for constraints based on the membership of substrings in regular languages. In particular, we introduce a problem we denote String Editing under Substring in Language Constraints (StrEdit-SILC), where provided a wildcard-free string xΣ, a finite set of regular languages R={L1,L2,}, and a regular language LF, the objective is to find a minimum cost set of mutation, insertion, and deletion edit operations Q that suffice to convert the input string x into a string xΣ, where no substring has membership in LF, and LiR, there exists a substring in Li. Here, letting Ψ and ϖ be the sum of all regular expression lengths and longest regular expression length for languages in R{LF}, respectively, and letting CmidN be the maximum cost of an edit operation, we show that StrEdit-SILC is fixed-parameter tractable with respect to Ψ, having time complexity O(2Ψ|x|(ϖ|Σ|+Cmid)). However, we also show that StrEdit-SILC is MAX-SNP-hard and otherwise difficult to approximate under stringent constraints.
模式约束下的字符串编辑
我们引入了新颖的最近模式约束字符串(NPCS)问题,即找到字符突变、插入和删除编辑操作的最小集合 Q,足以修改字符串 x,使其包含模式集 P 中的所有连续子串,且不包含禁止模式集 F 中的任何连续子串。假设 Σ 是允许字符的字母表,η 和 ϒ 分别是 P∪F 中最长的字符串长度和所有字符串长度之和,我们证明 NPCS 在 |P| 中是固定参数可处理的,时间复杂度为 O(2|P|⋅ϒ⋅|Σ|⋅(|P|+η)(|x|+1))。此外,我们还考虑了 NPCS 问题的一般化,其中我们允许基于常规语言中子串成员资格的约束。在这个问题中,我们提供了一个无通配符字符串 x∈Σ⁎、一组有限的正则语言 R={L1,L2,...} 和正则表达式语言 LF,目标是找到一组最小代价的突变、插入和删除编辑操作 Q,足以将输入字符串 x 转换成字符串 x′∈Σ⁎,其中没有子串是 LF 中的成员,并且∀Li∈R 中存在一个子串。这里,让Ψ和ϖ分别为 R∪{LF} 中语言的所有正则表达式长度和最长正则表达式长度之和,让 Cmid∈N 为编辑操作的最大代价,我们证明 StrEdit-SILC 在Ψ方面是固定参数可控的,时间复杂度为 O(2Ψ|x|⋅(ϖ⋅|Σ|+Cmid))。不过,我们也证明了 StrEdit-SILC 是 MAX-SNP 难的,在严格的约束条件下很难近似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Theoretical Computer Science
Theoretical Computer Science 工程技术-计算机:理论方法
CiteScore
2.60
自引率
18.20%
发文量
471
审稿时长
12.6 months
期刊介绍: Theoretical Computer Science is mathematical and abstract in spirit, but it derives its motivation from practical and everyday computation. Its aim is to understand the nature of computation and, as a consequence of this understanding, provide more efficient methodologies. All papers introducing or studying mathematical, logic and formal concepts and methods are welcome, provided that their motivation is clearly drawn from the field of computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信