用于隐藏高效用序列模式的多核并行算法

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2022-02-15 DOI:10.1016/j.knosys.2021.107793

Ut Huynh , Bac Le , Duy-Tai Dinh , Hamido Fujita

{"title":"用于隐藏高效用序列模式的多核并行算法","authors":"Ut Huynh , Bac Le , Duy-Tai Dinh , Hamido Fujita","doi":"10.1016/j.knosys.2021.107793","DOIUrl":null,"url":null,"abstract":"<div>High-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High Utility Sequential Pattern Hiding Using Pure Array Structure (USHPA), High Utility Sequential Pattern Hiding Using Parallel Strategy (USHP), and High Utility Sequential Pattern Hiding Using Random Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named Pattern Utility Set for Hiding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.</div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"237 ","pages":"Article 107793"},"PeriodicalIF":7.2000,"publicationDate":"2022-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multi-core parallel algorithms for hiding high-utility sequential patterns\",\"authors\":\"Ut Huynh , Bac Le , Duy-Tai Dinh , Hamido Fujita\",\"doi\":\"10.1016/j.knosys.2021.107793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>High-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High Utility Sequential Pattern Hiding Using Pure Array Structure (USHPA), High Utility Sequential Pattern Hiding Using Parallel Strategy (USHP), and High Utility Sequential Pattern Hiding Using Random Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named Pattern Utility Set for Hiding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.</div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"237 \",\"pages\":\"Article 107793\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2022-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705121010017\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705121010017","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 6

摘要

高效用序列模式挖掘（HUSPM）可以应用于许多应用，如零售、市场购物篮分析、点击流分析、医疗保健数据分析和生物信息学。HUSPM算法从数据中发现有用的信息。然而，从黑暗的一面来看，竞争对手也可以披露敏感模式，他们对泄露的数据使用HUSPM算法。因此，使用高效用序列模式隐藏（HUSPH）来保护隐私信息不受HUSPM算法的影响。本文提出了三种用于在定量序列数据集上隐藏高效用序列模式的算法，即使用纯阵列结构的高效用序列图案隐藏（USHPA）、使用并行策略的高效用顺序图案隐藏（US HP）和使用随机分布策略的高效用序列图案隐藏算法（USHR）。这些算法使用了一个名为隐藏模式实用程序集（PUSH）的数据结构来加快隐藏过程。我们还引入了一个称为“隐私因素”的指标来评估隐藏结果的质量。在真实数据集上进行了比较实验，以评估所提出算法在运行时间、内存消耗、可扩展性、丢失成本和隐私因素方面的性能。结果表明，所提出的算法能够有效地净化输入数据集，并且在所有指标上都优于比较算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-core parallel algorithms for hiding high-utility sequential patterns

High-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High Utility Sequential Pattern Hiding Using Pure Array Structure (USHPA), High Utility Sequential Pattern Hiding Using Parallel Strategy (USHP), and High Utility Sequential Pattern Hiding Using Random Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named Pattern Utility Set for Hiding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.