用于隐藏高效用序列模式的多核并行算法

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ut Huynh , Bac Le , Duy-Tai Dinh , Hamido Fujita
{"title":"用于隐藏高效用序列模式的多核并行算法","authors":"Ut Huynh ,&nbsp;Bac Le ,&nbsp;Duy-Tai Dinh ,&nbsp;Hamido Fujita","doi":"10.1016/j.knosys.2021.107793","DOIUrl":null,"url":null,"abstract":"<div><p><span>High-utility sequential pattern mining<span> (HUSPM) can be applied in many applications such as retail, market basket analysis<span>, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High </span></span></span><u>U</u>tility <u>S</u>equential <u>P</u>attern <u>H</u>iding Using <u>P</u>ure <u>A</u>rray Structure (USHPA), High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>P</u>arallel Strategy (USHP), and High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>R</u><span>andom Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named </span><u>P</u>attern <u>U</u>tility <u>S</u>et for <u>H</u><span>iding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.</span></p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"237 ","pages":"Article 107793"},"PeriodicalIF":7.2000,"publicationDate":"2022-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multi-core parallel algorithms for hiding high-utility sequential patterns\",\"authors\":\"Ut Huynh ,&nbsp;Bac Le ,&nbsp;Duy-Tai Dinh ,&nbsp;Hamido Fujita\",\"doi\":\"10.1016/j.knosys.2021.107793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>High-utility sequential pattern mining<span> (HUSPM) can be applied in many applications such as retail, market basket analysis<span>, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High </span></span></span><u>U</u>tility <u>S</u>equential <u>P</u>attern <u>H</u>iding Using <u>P</u>ure <u>A</u>rray Structure (USHPA), High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>P</u>arallel Strategy (USHP), and High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>R</u><span>andom Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named </span><u>P</u>attern <u>U</u>tility <u>S</u>et for <u>H</u><span>iding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.</span></p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"237 \",\"pages\":\"Article 107793\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2022-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705121010017\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705121010017","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 6

摘要

高效用序列模式挖掘(HUSPM)可以应用于许多应用,如零售、市场购物篮分析、点击流分析、医疗保健数据分析和生物信息学。HUSPM算法从数据中发现有用的信息。然而,从黑暗的一面来看,竞争对手也可以披露敏感模式,他们对泄露的数据使用HUSPM算法。因此,使用高效用序列模式隐藏(HUSPH)来保护隐私信息不受HUSPM算法的影响。本文提出了三种用于在定量序列数据集上隐藏高效用序列模式的算法,即使用纯阵列结构的高效用序列图案隐藏(USHPA)、使用并行策略的高效用顺序图案隐藏(US HP)和使用随机分布策略的高效用序列图案隐藏算法(USHR)。这些算法使用了一个名为隐藏模式实用程序集(PUSH)的数据结构来加快隐藏过程。我们还引入了一个称为“隐私因素”的指标来评估隐藏结果的质量。在真实数据集上进行了比较实验,以评估所提出算法在运行时间、内存消耗、可扩展性、丢失成本和隐私因素方面的性能。结果表明,所提出的算法能够有效地净化输入数据集,并且在所有指标上都优于比较算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-core parallel algorithms for hiding high-utility sequential patterns

High-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High Utility Sequential Pattern Hiding Using Pure Array Structure (USHPA), High Utility Sequential Pattern Hiding Using Parallel Strategy (USHP), and High Utility Sequential Pattern Hiding Using Random Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named Pattern Utility Set for Hiding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信