{"title":"Web Mining from Interpretable Compressed Representation of Sparse Web","authors":"Connor C. J. Hryhoruk, C. Leung","doi":"10.1109/WI-IAT55865.2022.00097","DOIUrl":null,"url":null,"abstract":"Large datasets often contain computational constraints when under the non-trivial extraction of implicit, previously unknown, and potentially useful information. These datasets are everywhere, with a popular example being the World Wide Web. It acts as a mass data producer and consumer across multiple devices in a distributed fashion worldwide, containing massive amounts of data. The discovery of knowledge on the Web requires web intelligence solutions, which take advantages of data mining and data science. In the case of web mining, the mining of web structures provides commonly recommended web pages to web surfers by examining incoming and outgoing links on web pages. The overall size of the web is however sparse. Sparsity of the web comes from a high number of vertex nodes (i.e., web pages), with a small number of directed edges (i.e., incoming and outgoing hyperlinks between web pages). In this paper, we present a solution to the mining of frequent patterns from the sparse web. From the sparsity of the web, web pages are captured in compressed bitmaps that are then mined for discovery of these patterns. Our bitmap model ensures readability, flexibility, and allows for the capturing of important information across multiple 31-bit groups. The mining process is demonstrated on real-life web data to present its capacity of mining for interesting patterns from interpretable compressed representation of sparse data.","PeriodicalId":345445,"journal":{"name":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT55865.2022.00097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large datasets often contain computational constraints when under the non-trivial extraction of implicit, previously unknown, and potentially useful information. These datasets are everywhere, with a popular example being the World Wide Web. It acts as a mass data producer and consumer across multiple devices in a distributed fashion worldwide, containing massive amounts of data. The discovery of knowledge on the Web requires web intelligence solutions, which take advantages of data mining and data science. In the case of web mining, the mining of web structures provides commonly recommended web pages to web surfers by examining incoming and outgoing links on web pages. The overall size of the web is however sparse. Sparsity of the web comes from a high number of vertex nodes (i.e., web pages), with a small number of directed edges (i.e., incoming and outgoing hyperlinks between web pages). In this paper, we present a solution to the mining of frequent patterns from the sparse web. From the sparsity of the web, web pages are captured in compressed bitmaps that are then mined for discovery of these patterns. Our bitmap model ensures readability, flexibility, and allows for the capturing of important information across multiple 31-bit groups. The mining process is demonstrated on real-life web data to present its capacity of mining for interesting patterns from interpretable compressed representation of sparse data.