Mining High Utility Web Access Sequences in Dynamic Web Log Data

Chowdhury Farhan Ahmed, S. Tanbeer, Byeong-Soo Jeong
{"title":"Mining High Utility Web Access Sequences in Dynamic Web Log Data","authors":"Chowdhury Farhan Ahmed, S. Tanbeer, Byeong-Soo Jeong","doi":"10.1109/SNPD.2010.21","DOIUrl":null,"url":null,"abstract":"Mining web access sequences can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in web access sequences, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web traversal sequences with external utility, i.e., different impacts/significances for different web pages. In this paper, we propose a new approach to solve these problems. Moreover, we propose two novel tree structures, called UWAS-tree (utility-based web access sequence tree), and IUWAS-tree (incremental UWAS tree), for mining web access sequences in static and dynamic databases respectively. Our approach can handle both forward and backward references, static and dynamic data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times and considers both internal and external utilities of a web page. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility web access sequences.","PeriodicalId":266363,"journal":{"name":"2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2010.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67

Abstract

Mining web access sequences can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in web access sequences, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web traversal sequences with external utility, i.e., different impacts/significances for different web pages. In this paper, we propose a new approach to solve these problems. Moreover, we propose two novel tree structures, called UWAS-tree (utility-based web access sequence tree), and IUWAS-tree (incremental UWAS tree), for mining web access sequences in static and dynamic databases respectively. Our approach can handle both forward and backward references, static and dynamic data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times and considers both internal and external utilities of a web page. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility web access sequences.
动态Web日志数据中高效Web访问序列的挖掘
挖掘web访问序列可以从web日志中发现非常有用的知识,具有广泛的应用前景。通过考虑网页的非二进制出现作为网页访问序列的内部实用程序,例如,每个用户在网页上花费的时间,可以提取更真实的信息。然而,现有的基于实用程序的方法有许多局限性,例如只考虑web访问序列的前向引用,不适用于增量挖掘,在分层候选生成和测试方法中受到影响,需要多次数据库扫描,并且没有显示如何使用外部实用程序挖掘web遍历序列,即不同网页的不同影响/意义。在本文中,我们提出了一种解决这些问题的新方法。此外,我们提出了两种新的树结构,分别称为UWAS-tree(基于效用的web访问序列树)和IUWAS-tree(增量式UWAS树),用于在静态和动态数据库中挖掘web访问序列。我们的方法可以处理前向和后向引用、静态和动态数据,避免了分层候选生成和测试方法,不需要多次扫描数据库,并考虑了网页的内部和外部实用程序。广泛的性能分析表明,我们的方法对于高实用web访问序列的静态和增量挖掘都非常有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信