利用一种新的效用函数挖掘高效用序列

2021 13th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2021-11-10 DOI:10.1109/KSE53942.2021.9648660

Hanh-Thong Huynh, Hai V. Duong, Tin C. Truong, Bac Le, Philippe Fournier-Viger

{"title":"利用一种新的效用函数挖掘高效用序列","authors":"Hanh-Thong Huynh, Hai V. Duong, Tin C. Truong, Bac Le, Philippe Fournier-Viger","doi":"10.1109/KSE53942.2021.9648660","DOIUrl":null,"url":null,"abstract":"Mining high utility sequential patterns (HUSP) is a popular data mining task. The goal is to find all subsequences that yield a high utility (e.g. high profit) in a quantitative sequence database (QSDB). Traditional algorithms for this task have many uses but a major limitation is that they rely on the maximum or minimum utility measures for calculating the utility of a pattern, thus assuming either a best or worst case scenario. These measures are unsuitable for many real-life applications such as business decision-making. To address this issue, this paper introduces a novel utility function (NUF) to calculate the utility of a sequence in each input sequence, which provides a trade-off between the above two extreme cases. A novel upper bound on NUF is designed as well as search space pruning strategies to eliminate unpromising candidate patterns early. These contributions are integrated into a novel efficient algorithm named FHNewUSM to discover frequent HUSPs with NUF. An experimental study with both real-life and synthetic databases shows that the proposed algorithm is efficient for mining HUSPs with NUF in terms of execution time, memory consumption and scalability.","PeriodicalId":130986,"journal":{"name":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mining High Utility Sequences with a Novel Utility Function\",\"authors\":\"Hanh-Thong Huynh, Hai V. Duong, Tin C. Truong, Bac Le, Philippe Fournier-Viger\",\"doi\":\"10.1109/KSE53942.2021.9648660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mining high utility sequential patterns (HUSP) is a popular data mining task. The goal is to find all subsequences that yield a high utility (e.g. high profit) in a quantitative sequence database (QSDB). Traditional algorithms for this task have many uses but a major limitation is that they rely on the maximum or minimum utility measures for calculating the utility of a pattern, thus assuming either a best or worst case scenario. These measures are unsuitable for many real-life applications such as business decision-making. To address this issue, this paper introduces a novel utility function (NUF) to calculate the utility of a sequence in each input sequence, which provides a trade-off between the above two extreme cases. A novel upper bound on NUF is designed as well as search space pruning strategies to eliminate unpromising candidate patterns early. These contributions are integrated into a novel efficient algorithm named FHNewUSM to discover frequent HUSPs with NUF. An experimental study with both real-life and synthetic databases shows that the proposed algorithm is efficient for mining HUSPs with NUF in terms of execution time, memory consumption and scalability.\",\"PeriodicalId\":130986,\"journal\":{\"name\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"193 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE53942.2021.9648660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE53942.2021.9648660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

挖掘高效用序列模式(HUSP)是一种流行的数据挖掘任务。目标是在定量序列数据库(QSDB)中找到所有产生高效用(例如高利润)的子序列。用于此任务的传统算法有许多用途，但一个主要限制是它们依赖于计算模式效用的最大或最小效用度量，因此假设最好或最坏的情况。这些措施不适合许多实际应用，如商业决策。为了解决这个问题，本文引入了一个新的效用函数(NUF)来计算每个输入序列中序列的效用，它提供了上述两种极端情况之间的权衡。设计了一种新的NUF上界和搜索空间修剪策略，以尽早消除不受欢迎的候选模式。这些贡献被集成到一个名为FHNewUSM的新型高效算法中，用于发现具有NUF的频繁husp。在真实数据库和合成数据库中进行的实验研究表明，该算法在执行时间、内存消耗和可扩展性方面都能有效地挖掘具有NUF的husp。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mining High Utility Sequences with a Novel Utility Function

Mining high utility sequential patterns (HUSP) is a popular data mining task. The goal is to find all subsequences that yield a high utility (e.g. high profit) in a quantitative sequence database (QSDB). Traditional algorithms for this task have many uses but a major limitation is that they rely on the maximum or minimum utility measures for calculating the utility of a pattern, thus assuming either a best or worst case scenario. These measures are unsuitable for many real-life applications such as business decision-making. To address this issue, this paper introduces a novel utility function (NUF) to calculate the utility of a sequence in each input sequence, which provides a trade-off between the above two extreme cases. A novel upper bound on NUF is designed as well as search space pruning strategies to eliminate unpromising candidate patterns early. These contributions are integrated into a novel efficient algorithm named FHNewUSM to discover frequent HUSPs with NUF. An experimental study with both real-life and synthetic databases shows that the proposed algorithm is efficient for mining HUSPs with NUF in terms of execution time, memory consumption and scalability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 13th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量