{"title":"用于隐藏高效用序列模式的多核并行算法","authors":"Ut Huynh , Bac Le , Duy-Tai Dinh , Hamido Fujita","doi":"10.1016/j.knosys.2021.107793","DOIUrl":null,"url":null,"abstract":"<div><p><span>High-utility sequential pattern mining<span> (HUSPM) can be applied in many applications such as retail, market basket analysis<span>, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High </span></span></span><u>U</u>tility <u>S</u>equential <u>P</u>attern <u>H</u>iding Using <u>P</u>ure <u>A</u>rray Structure (USHPA), High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>P</u>arallel Strategy (USHP), and High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>R</u><span>andom Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named </span><u>P</u>attern <u>U</u>tility <u>S</u>et for <u>H</u><span>iding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.</span></p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"237 ","pages":"Article 107793"},"PeriodicalIF":7.2000,"publicationDate":"2022-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multi-core parallel algorithms for hiding high-utility sequential patterns\",\"authors\":\"Ut Huynh , Bac Le , Duy-Tai Dinh , Hamido Fujita\",\"doi\":\"10.1016/j.knosys.2021.107793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>High-utility sequential pattern mining<span> (HUSPM) can be applied in many applications such as retail, market basket analysis<span>, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High </span></span></span><u>U</u>tility <u>S</u>equential <u>P</u>attern <u>H</u>iding Using <u>P</u>ure <u>A</u>rray Structure (USHPA), High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>P</u>arallel Strategy (USHP), and High <u>U</u>tility <u>S</u>equential Pattern <u>H</u>iding Using <u>R</u><span>andom Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named </span><u>P</u>attern <u>U</u>tility <u>S</u>et for <u>H</u><span>iding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.</span></p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"237 \",\"pages\":\"Article 107793\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2022-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705121010017\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705121010017","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Multi-core parallel algorithms for hiding high-utility sequential patterns
High-utility sequential pattern mining (HUSPM) can be applied in many applications such as retail, market basket analysis, click-stream analysis, healthcare data analysis, and bioinformatics. HUSPM algorithms discover useful information from data. However, looking at the dark side, the sensitive patterns can also be disclosed by the competitors, who use a HUSPM algorithm on the leaked data. Therefore, high-utility sequential pattern hiding (HUSPH) is used to protect the privacy information from HUSPM algorithms. This paper proposes three algorithms named High Utility Sequential Pattern Hiding Using Pure Array Structure (USHPA), High Utility Sequential Pattern Hiding Using Parallel Strategy (USHP), and High Utility Sequential Pattern Hiding Using Random Distribution Strategy (USHR) for hiding high-utility sequential patterns on quantitative sequence datasets. These algorithms use a proposed data structure named Pattern Utility Set for Hiding (PUSH) to speed up the hiding process. We also introduce a metric called Privacy Factor to evaluate the quality of hiding results. The comparative experiments were conducted on real datasets to evaluate the performance of the proposed algorithms in terms of runtime, memory consumption, scalability, missing cost, and privacy factor. Results show that the proposed algorithms can efficiently sanitize the input datasets, and they outperform the compared algorithms for all metrics.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.