{"title":"从WWW日志数据中发现知识","authors":"F. Tao, F. Murtagh","doi":"10.1109/ITCC.2000.844242","DOIUrl":null,"url":null,"abstract":"As the result of interactions between visitors and a Web site, an http log file contains very rich knowledge about users' on-site behavior, which, if fully exploited, can better customer services and site performance. Different to most of the existing log analysis tools which use statistical counting summaries on pages, hosts, etc., we propose a transaction model to represent users' access history and a framework to adapt data mining techniques such as sequence and association rule mining to these transactions. In this framework, all transactions are extracted from the raw log file though a series of step by step data preparation phases. We discuss different methods to identify a user, and separate long convoluted sequences into semantically meaningful sessions and transactions. A new feature called interestingness is defined to model user interests in different Web sections. With all the transactions being imported into an adapted cube structure with a concept hierarchy attached to each dimension of it, it is possible to carry out multi-dimensional data mining at multi-abstract levels. Using interest context rules, we demonstrate the potentially significant meaning of this system prototype.","PeriodicalId":146581,"journal":{"name":"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Towards knowledge discovery from WWW log data\",\"authors\":\"F. Tao, F. Murtagh\",\"doi\":\"10.1109/ITCC.2000.844242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the result of interactions between visitors and a Web site, an http log file contains very rich knowledge about users' on-site behavior, which, if fully exploited, can better customer services and site performance. Different to most of the existing log analysis tools which use statistical counting summaries on pages, hosts, etc., we propose a transaction model to represent users' access history and a framework to adapt data mining techniques such as sequence and association rule mining to these transactions. In this framework, all transactions are extracted from the raw log file though a series of step by step data preparation phases. We discuss different methods to identify a user, and separate long convoluted sequences into semantically meaningful sessions and transactions. A new feature called interestingness is defined to model user interests in different Web sections. With all the transactions being imported into an adapted cube structure with a concept hierarchy attached to each dimension of it, it is possible to carry out multi-dimensional data mining at multi-abstract levels. Using interest context rules, we demonstrate the potentially significant meaning of this system prototype.\",\"PeriodicalId\":146581,\"journal\":{\"name\":\"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITCC.2000.844242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2000.844242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
As the result of interactions between visitors and a Web site, an http log file contains very rich knowledge about users' on-site behavior, which, if fully exploited, can better customer services and site performance. Different to most of the existing log analysis tools which use statistical counting summaries on pages, hosts, etc., we propose a transaction model to represent users' access history and a framework to adapt data mining techniques such as sequence and association rule mining to these transactions. In this framework, all transactions are extracted from the raw log file though a series of step by step data preparation phases. We discuss different methods to identify a user, and separate long convoluted sequences into semantically meaningful sessions and transactions. A new feature called interestingness is defined to model user interests in different Web sections. With all the transactions being imported into an adapted cube structure with a concept hierarchy attached to each dimension of it, it is possible to carry out multi-dimensional data mining at multi-abstract levels. Using interest context rules, we demonstrate the potentially significant meaning of this system prototype.