{"title":"String editing under pattern constraints","authors":"Robert D. Barish, Tetsuo Shibuya","doi":"10.1016/j.tcs.2024.114889","DOIUrl":null,"url":null,"abstract":"<div><div>We introduce the novel Nearest Pattern Constrained String (NPCS) problem of finding a minimum set <span><math><mi>Q</mi></math></span> of character mutation, insertion, and deletion edit operations sufficient to modify a string <em>x</em> to contain all contiguous substrings in a pattern set <span><math><mi>P</mi></math></span> and no contiguous substrings in a forbidden pattern set <span><math><mi>F</mi></math></span>. Letting Σ be the alphabet of allowed characters, and letting <em>η</em> and ϒ be the longest string length and sum of all string lengths in <span><math><mi>P</mi><mo>∪</mo><mi>F</mi></math></span>, respectively, we show that NPCS is fixed-parameter tractable in <span><math><mo>|</mo><mi>P</mi><mo>|</mo></math></span> with time complexity <span><math><mi>O</mi><mrow><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mo>|</mo><mi>P</mi><mo>|</mo></mrow></msup><mo>⋅</mo><mi>ϒ</mi><mo>⋅</mo><mo>|</mo><mi>Σ</mi><mo>|</mo><mo>⋅</mo><mrow><mo>(</mo><mo>|</mo><mi>P</mi><mo>|</mo><mo>+</mo><mi>η</mi><mo>)</mo></mrow><mrow><mo>(</mo><mo>|</mo><mi>x</mi><mo>|</mo><mo>+</mo><mn>1</mn><mo>)</mo></mrow><mo>)</mo></mrow></math></span>. Additionally, we consider a generalization of the NPCS problem in which we allow for constraints based on the membership of substrings in regular languages. In particular, we introduce a problem we denote String Editing under Substring in Language Constraints (StrEdit-SILC), where provided a wildcard-free string <span><math><mi>x</mi><mo>∈</mo><msup><mrow><mi>Σ</mi></mrow><mrow><mo>⁎</mo></mrow></msup></math></span>, a finite set of regular languages <span><math><mi>R</mi><mo>=</mo><mo>{</mo><msub><mrow><mi>L</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>,</mo><mo>…</mo><mo>}</mo></math></span>, and a regular language <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi></mrow></msub></math></span>, the objective is to find a minimum cost set of mutation, insertion, and deletion edit operations <span><math><mi>Q</mi></math></span> that suffice to convert the input string <em>x</em> into a string <span><math><msup><mrow><mi>x</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>∈</mo><msup><mrow><mi>Σ</mi></mrow><mrow><mo>⁎</mo></mrow></msup></math></span>, where no substring has membership in <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi></mrow></msub></math></span>, and <span><math><mo>∀</mo><msub><mrow><mi>L</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∈</mo><mi>R</mi></math></span>, there exists a substring in <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span>. Here, letting Ψ and <em>ϖ</em> be the sum of all regular expression lengths and longest regular expression length for languages in <span><math><mi>R</mi><mo>∪</mo><mo>{</mo><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi></mrow></msub><mo>}</mo></math></span>, respectively, and letting <span><math><msub><mrow><mi>C</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>d</mi></mrow></msub><mo>∈</mo><mi>N</mi></math></span> be the maximum cost of an edit operation, we show that StrEdit-SILC is fixed-parameter tractable with respect to Ψ, having time complexity <span><math><mi>O</mi><mrow><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mi>Ψ</mi></mrow></msup><mo>⋅</mo><mo>|</mo><mi>x</mi><mo>|</mo><mo>⋅</mo><mrow><mo>(</mo><mi>ϖ</mi><mo>⋅</mo><mo>|</mo><mi>Σ</mi><mo>|</mo><mo>+</mo><msub><mrow><mi>C</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>d</mi></mrow></msub><mo>)</mo></mrow><mo>)</mo></mrow></math></span>. However, we also show that StrEdit-SILC is MAX-SNP-hard and otherwise difficult to approximate under stringent constraints.</div></div>","PeriodicalId":49438,"journal":{"name":"Theoretical Computer Science","volume":"1022 ","pages":"Article 114889"},"PeriodicalIF":0.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Computer Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304397524005061","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce the novel Nearest Pattern Constrained String (NPCS) problem of finding a minimum set of character mutation, insertion, and deletion edit operations sufficient to modify a string x to contain all contiguous substrings in a pattern set and no contiguous substrings in a forbidden pattern set . Letting Σ be the alphabet of allowed characters, and letting η and ϒ be the longest string length and sum of all string lengths in , respectively, we show that NPCS is fixed-parameter tractable in with time complexity . Additionally, we consider a generalization of the NPCS problem in which we allow for constraints based on the membership of substrings in regular languages. In particular, we introduce a problem we denote String Editing under Substring in Language Constraints (StrEdit-SILC), where provided a wildcard-free string , a finite set of regular languages , and a regular language , the objective is to find a minimum cost set of mutation, insertion, and deletion edit operations that suffice to convert the input string x into a string , where no substring has membership in , and , there exists a substring in . Here, letting Ψ and ϖ be the sum of all regular expression lengths and longest regular expression length for languages in , respectively, and letting be the maximum cost of an edit operation, we show that StrEdit-SILC is fixed-parameter tractable with respect to Ψ, having time complexity . However, we also show that StrEdit-SILC is MAX-SNP-hard and otherwise difficult to approximate under stringent constraints.
期刊介绍:
Theoretical Computer Science is mathematical and abstract in spirit, but it derives its motivation from practical and everyday computation. Its aim is to understand the nature of computation and, as a consequence of this understanding, provide more efficient methodologies. All papers introducing or studying mathematical, logic and formal concepts and methods are welcome, provided that their motivation is clearly drawn from the field of computing.