隐藏序列数据中有间隙的模式

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Guiyuan Zhao, Dequan Chen, Meng Zhang
{"title":"隐藏序列数据中有间隙的模式","authors":"Guiyuan Zhao,&nbsp;Dequan Chen,&nbsp;Meng Zhang","doi":"10.1002/cpe.70187","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>String sanitization addresses the challenge of removing sensitive patterns from text data while retaining the usefulness of the remaining content. The task becomes especially demanding when sensitive patterns may span variable length gaps—a scenario common in fields like bioinformatics, web analysis, and network traffic monitoring. In this work, we formalize the Pattern Hide with Gaps (PHG) problem, extending traditional string sanitization to handle VLG patterns. To solve PHG, we introduce three novel sanitization algorithms that balance different aspects of data utility: the first algorithm rapidly removes all sensitive patterns to achieve basic sanitization; the second carefully selects replacements to minimize overall distortion; and the third algorithm focuses on reducing the loss of frequent patterns to enhance the accuracy of subsequent frequent pattern mining tasks. Extensive experiments demonstrate that our methods run efficiently and successfully maintain high data utility.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 21-22","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hiding Patterns With Gaps in Sequential Data\",\"authors\":\"Guiyuan Zhao,&nbsp;Dequan Chen,&nbsp;Meng Zhang\",\"doi\":\"10.1002/cpe.70187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>String sanitization addresses the challenge of removing sensitive patterns from text data while retaining the usefulness of the remaining content. The task becomes especially demanding when sensitive patterns may span variable length gaps—a scenario common in fields like bioinformatics, web analysis, and network traffic monitoring. In this work, we formalize the Pattern Hide with Gaps (PHG) problem, extending traditional string sanitization to handle VLG patterns. To solve PHG, we introduce three novel sanitization algorithms that balance different aspects of data utility: the first algorithm rapidly removes all sensitive patterns to achieve basic sanitization; the second carefully selects replacements to minimize overall distortion; and the third algorithm focuses on reducing the loss of frequent patterns to enhance the accuracy of subsequent frequent pattern mining tasks. Extensive experiments demonstrate that our methods run efficiently and successfully maintain high data utility.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"37 21-22\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70187\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70187","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

字符串处理解决了从文本数据中删除敏感模式的挑战,同时保留了剩余内容的有用性。当敏感模式可能跨越可变长度的间隙时,任务变得特别苛刻——这是生物信息学、web分析和网络流量监控等领域常见的场景。在这项工作中,我们形式化了带间隙的模式隐藏(PHG)问题,扩展了传统的字符串处理来处理VLG模式。为了解决PHG问题,我们引入了三种平衡数据效用不同方面的新型清理算法:第一种算法快速删除所有敏感模式以实现基本清理;第二种是仔细选择替代品,以尽量减少整体变形;第三种算法侧重于减少频繁模式的丢失,以提高后续频繁模式挖掘任务的准确性。大量的实验表明,我们的方法运行有效,并成功地保持了较高的数据利用率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hiding Patterns With Gaps in Sequential Data

String sanitization addresses the challenge of removing sensitive patterns from text data while retaining the usefulness of the remaining content. The task becomes especially demanding when sensitive patterns may span variable length gaps—a scenario common in fields like bioinformatics, web analysis, and network traffic monitoring. In this work, we formalize the Pattern Hide with Gaps (PHG) problem, extending traditional string sanitization to handle VLG patterns. To solve PHG, we introduce three novel sanitization algorithms that balance different aspects of data utility: the first algorithm rapidly removes all sensitive patterns to achieve basic sanitization; the second carefully selects replacements to minimize overall distortion; and the third algorithm focuses on reducing the loss of frequent patterns to enhance the accuracy of subsequent frequent pattern mining tasks. Extensive experiments demonstrate that our methods run efficiently and successfully maintain high data utility.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信