{"title":"Hiding Patterns With Gaps in Sequential Data","authors":"Guiyuan Zhao, Dequan Chen, Meng Zhang","doi":"10.1002/cpe.70187","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>String sanitization addresses the challenge of removing sensitive patterns from text data while retaining the usefulness of the remaining content. The task becomes especially demanding when sensitive patterns may span variable length gaps—a scenario common in fields like bioinformatics, web analysis, and network traffic monitoring. In this work, we formalize the Pattern Hide with Gaps (PHG) problem, extending traditional string sanitization to handle VLG patterns. To solve PHG, we introduce three novel sanitization algorithms that balance different aspects of data utility: the first algorithm rapidly removes all sensitive patterns to achieve basic sanitization; the second carefully selects replacements to minimize overall distortion; and the third algorithm focuses on reducing the loss of frequent patterns to enhance the accuracy of subsequent frequent pattern mining tasks. Extensive experiments demonstrate that our methods run efficiently and successfully maintain high data utility.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 21-22","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70187","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
String sanitization addresses the challenge of removing sensitive patterns from text data while retaining the usefulness of the remaining content. The task becomes especially demanding when sensitive patterns may span variable length gaps—a scenario common in fields like bioinformatics, web analysis, and network traffic monitoring. In this work, we formalize the Pattern Hide with Gaps (PHG) problem, extending traditional string sanitization to handle VLG patterns. To solve PHG, we introduce three novel sanitization algorithms that balance different aspects of data utility: the first algorithm rapidly removes all sensitive patterns to achieve basic sanitization; the second carefully selects replacements to minimize overall distortion; and the third algorithm focuses on reducing the loss of frequent patterns to enhance the accuracy of subsequent frequent pattern mining tasks. Extensive experiments demonstrate that our methods run efficiently and successfully maintain high data utility.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.