A novel datacube model supporting interactive web-log mining

Tadashi Ohmori, Yuichi Tsutatani, M. Hoshi
{"title":"A novel datacube model supporting interactive web-log mining","authors":"Tadashi Ohmori, Yuichi Tsutatani, M. Hoshi","doi":"10.1109/CW.2002.1180909","DOIUrl":null,"url":null,"abstract":"Web-log mining is a technique to find \"useful\" information from access-log data. Typically, association rule mining is used to find frequent patterns (or sequence patterns) of visited pages from access logs and to build users' behavior models from those patterns. In this direction, there exists a difficulty that a human decision-maker must do such data mining process many times under different constraining conditions, different groups of pages, and different levels of abstraction. In order to support this process, this paper proposes a novel datacube model called itemset cube. This cube manages frequent itemsets under various conditions which are modeled by a n-dimensional space. An itemset cube is materialized, sliced, and rolled-up repeatedly in the same way as a traditional scalar datacube is done for interactive scalar-value analysis. Although this looks simple, fast execution of these operations on an itemset cube is difficult. It is because different cells in an itemset cube contain different numbers of records, but these cells must use the same threshold ratios in order to detect frequent itemsets of equal quality. In this paper, a datacube model for storing frequent itemsets is described, and then an efficient algorithm of associated operations is proposed. Its application to a real-life dataset is also demonstrated.","PeriodicalId":376322,"journal":{"name":"First International Symposium on Cyber Worlds, 2002. Proceedings.","volume":"116 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"First International Symposium on Cyber Worlds, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CW.2002.1180909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Web-log mining is a technique to find "useful" information from access-log data. Typically, association rule mining is used to find frequent patterns (or sequence patterns) of visited pages from access logs and to build users' behavior models from those patterns. In this direction, there exists a difficulty that a human decision-maker must do such data mining process many times under different constraining conditions, different groups of pages, and different levels of abstraction. In order to support this process, this paper proposes a novel datacube model called itemset cube. This cube manages frequent itemsets under various conditions which are modeled by a n-dimensional space. An itemset cube is materialized, sliced, and rolled-up repeatedly in the same way as a traditional scalar datacube is done for interactive scalar-value analysis. Although this looks simple, fast execution of these operations on an itemset cube is difficult. It is because different cells in an itemset cube contain different numbers of records, but these cells must use the same threshold ratios in order to detect frequent itemsets of equal quality. In this paper, a datacube model for storing frequent itemsets is described, and then an efficient algorithm of associated operations is proposed. Its application to a real-life dataset is also demonstrated.
一种支持交互式web日志挖掘的新型数据立方体模型
web日志挖掘是一种从访问日志数据中找到“有用”信息的技术。通常,关联规则挖掘用于从访问日志中查找访问页面的频繁模式(或序列模式),并根据这些模式构建用户的行为模型。在这个方向上,存在一个困难,即人类决策者必须在不同的约束条件、不同的页面组和不同的抽象级别下多次进行这样的数据挖掘过程。为了支持这一过程,本文提出了一种新的数据立方体模型——项集立方体。该多维数据集管理由n维空间建模的各种条件下的频繁项集。项目集多维数据集被反复物化、切片和卷起,其方式与用于交互式标量值分析的传统标量数据集相同。虽然这看起来很简单,但是在项集多维数据集上快速执行这些操作是很困难的。这是因为项目集立方体中的不同单元格包含不同数量的记录,但是这些单元格必须使用相同的阈值比率,以便检测相同质量的频繁项目集。本文描述了一种存储频繁项集的数据立方体模型,并在此基础上提出了一种高效的关联操作算法。本文还演示了其在实际数据集中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信