从网页导航数据中发现用户会话的模糊集理论方法

2011 IEEE Recent Advances in Intelligent Computational Systems Pub Date : 2011-11-03 DOI:10.1109/RAICS.2011.6069435

Z. Ansari, A. V. Babuy, W. Ahmed, Mohammad Fazle Azeemz

{"title":"从网页导航数据中发现用户会话的模糊集理论方法","authors":"Z. Ansari, A. V. Babuy, W. Ahmed, Mohammad Fazle Azeemz","doi":"10.1109/RAICS.2011.6069435","DOIUrl":null,"url":null,"abstract":"Due to the continuous increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. In order to design attractive web sites, designers must understand their users' needs. Therefore analysing navigational behaviour of users is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage data in order to discover the patterns that can be used to analyse the user's navigational behaviour. Preprocessing, knowledge extraction and results analysis are the three main steps of WUM. Due to large amount of irrelevant information present in the web logs, the original log file can not be directly used in the WUM process. During the preprocessing stage of WUM raw web log data is to transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. This sessionized data can be used as the input for a variety of data mining tasks such as clustering, association rule mining, sequence mining etc. If the data mining task at hand is clustering, the session files are filtered to remove very small sessions in order to eliminate the noise from the data. But direct removal of these small sized sessions may result in loss of a significant amount of information specially when the number of small sessions is large. We propose a “Fuzzy Set Theoretic” approach to deal with this problem. Instead of directly removing all the small sessions below a specified threshold, we assign weights to all the sessions using a “Fuzzy Membership Function” based on the number of URLs accessed by the sessions. After assigning the weights we apply a “Fuzzy c-Mean Clustering” algorithm to discover the clusters of user profiles. In this paper, we provide a detailed review of various techniques to preprocess the web log data including data fusion, data cleaning, user identification and session identification. We also describe our methodology to perform feature selection (or dimensionality reduction) and session weight assignment tasks. Finally we compare our soft computing based approach of session weight assignment with the traditional hard computing based approach of small session elimination.","PeriodicalId":394515,"journal":{"name":"2011 IEEE Recent Advances in Intelligent Computational Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"A Fuzzy Set Theoretic approach to discover user sessions from web navigational data\",\"authors\":\"Z. Ansari, A. V. Babuy, W. Ahmed, Mohammad Fazle Azeemz\",\"doi\":\"10.1109/RAICS.2011.6069435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the continuous increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. In order to design attractive web sites, designers must understand their users' needs. Therefore analysing navigational behaviour of users is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage data in order to discover the patterns that can be used to analyse the user's navigational behaviour. Preprocessing, knowledge extraction and results analysis are the three main steps of WUM. Due to large amount of irrelevant information present in the web logs, the original log file can not be directly used in the WUM process. During the preprocessing stage of WUM raw web log data is to transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. This sessionized data can be used as the input for a variety of data mining tasks such as clustering, association rule mining, sequence mining etc. If the data mining task at hand is clustering, the session files are filtered to remove very small sessions in order to eliminate the noise from the data. But direct removal of these small sized sessions may result in loss of a significant amount of information specially when the number of small sessions is large. We propose a “Fuzzy Set Theoretic” approach to deal with this problem. Instead of directly removing all the small sessions below a specified threshold, we assign weights to all the sessions using a “Fuzzy Membership Function” based on the number of URLs accessed by the sessions. After assigning the weights we apply a “Fuzzy c-Mean Clustering” algorithm to discover the clusters of user profiles. In this paper, we provide a detailed review of various techniques to preprocess the web log data including data fusion, data cleaning, user identification and session identification. We also describe our methodology to perform feature selection (or dimensionality reduction) and session weight assignment tasks. Finally we compare our soft computing based approach of session weight assignment with the traditional hard computing based approach of small session elimination.\",\"PeriodicalId\":394515,\"journal\":{\"name\":\"2011 IEEE Recent Advances in Intelligent Computational Systems\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Recent Advances in Intelligent Computational Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAICS.2011.6069435\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Recent Advances in Intelligent Computational Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAICS.2011.6069435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

由于万维网的不断增长和复杂性，网站发布者在吸引和留住用户方面面临着越来越大的困难。为了设计出有吸引力的网站，设计师必须了解用户的需求。因此，分析用户的导航行为是网页设计的重要组成部分。Web Usage Mining (WUM)是将数据挖掘技术应用于Web使用数据，以发现可用于分析用户导航行为的模式。预处理、知识提取和结果分析是WUM的三个主要步骤。由于web日志中存在大量不相关的信息，原始日志文件不能直接用于WUM进程。在WUM的预处理阶段，将原始web日志数据转换为一组用户配置文件。每个用户配置文件捕获一组表示用户会话的url。这种会话化的数据可以用作各种数据挖掘任务的输入，如聚类、关联规则挖掘、序列挖掘等。如果手头的数据挖掘任务是聚类，则会过滤会话文件以删除非常小的会话，从而消除数据中的噪声。但是，直接删除这些小型会话可能会导致大量信息的丢失，特别是当小型会话的数量很大时。我们提出一种“模糊集合论”的方法来处理这个问题。我们没有直接删除低于指定阈值的所有小会话，而是使用基于会话访问的url数量的“模糊隶属函数”为所有会话分配权重。在分配权重后，我们应用“模糊c均值聚类”算法来发现用户配置文件的聚类。本文详细介绍了网络日志数据的预处理技术，包括数据融合、数据清洗、用户识别和会话识别。我们还描述了执行特征选择(或降维)和会话权重分配任务的方法。最后，我们将基于软计算的会话权重分配方法与传统的基于硬计算的小会话消除方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Fuzzy Set Theoretic approach to discover user sessions from web navigational data

Due to the continuous increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. In order to design attractive web sites, designers must understand their users' needs. Therefore analysing navigational behaviour of users is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage data in order to discover the patterns that can be used to analyse the user's navigational behaviour. Preprocessing, knowledge extraction and results analysis are the three main steps of WUM. Due to large amount of irrelevant information present in the web logs, the original log file can not be directly used in the WUM process. During the preprocessing stage of WUM raw web log data is to transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. This sessionized data can be used as the input for a variety of data mining tasks such as clustering, association rule mining, sequence mining etc. If the data mining task at hand is clustering, the session files are filtered to remove very small sessions in order to eliminate the noise from the data. But direct removal of these small sized sessions may result in loss of a significant amount of information specially when the number of small sessions is large. We propose a “Fuzzy Set Theoretic” approach to deal with this problem. Instead of directly removing all the small sessions below a specified threshold, we assign weights to all the sessions using a “Fuzzy Membership Function” based on the number of URLs accessed by the sessions. After assigning the weights we apply a “Fuzzy c-Mean Clustering” algorithm to discover the clusters of user profiles. In this paper, we provide a detailed review of various techniques to preprocess the web log data including data fusion, data cleaning, user identification and session identification. We also describe our methodology to perform feature selection (or dimensionality reduction) and session weight assignment tasks. Finally we compare our soft computing based approach of session weight assignment with the traditional hard computing based approach of small session elimination.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE Recent Advances in Intelligent Computational Systems

自引率

0.00%

发文量