{"title":"Web Proxy Log Data Mining System for Clustering Users and Search Keywords","authors":"T. Bilgin, M. Aytekin","doi":"10.18466/cbayarfbe.330088","DOIUrl":null,"url":null,"abstract":"In this study, Internet users were clustered by the search keywords which they type into search bars of search engines. Our proposed software is called UQCS (User Queries Clustering System) and it was developed to demonstrate the efficiency of our hypothesis. UQCS co-operates with the Strehl’s relationship based clustering toolkit and performs segmentation on users based on the keywords they use for searching the web. Internet Proxy server logs were parsed and query strings were extracted from the search engine URL’s and the resulting IP-Term matrix was converted into a similarity matrix using Euclidean, Jaccard, Cosine Distance and Pearson Correlation Distance metrics. K- Means and graph-based OPOSSUM algorithm were used to perform clustering on the similarity matrices. Results were illustrated by using CLUSION visualization toolkit.","PeriodicalId":9652,"journal":{"name":"Celal Bayar Universitesi Fen Bilimleri Dergisi","volume":"344 1","pages":"873-881"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Celal Bayar Universitesi Fen Bilimleri Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18466/cbayarfbe.330088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, Internet users were clustered by the search keywords which they type into search bars of search engines. Our proposed software is called UQCS (User Queries Clustering System) and it was developed to demonstrate the efficiency of our hypothesis. UQCS co-operates with the Strehl’s relationship based clustering toolkit and performs segmentation on users based on the keywords they use for searching the web. Internet Proxy server logs were parsed and query strings were extracted from the search engine URL’s and the resulting IP-Term matrix was converted into a similarity matrix using Euclidean, Jaccard, Cosine Distance and Pearson Correlation Distance metrics. K- Means and graph-based OPOSSUM algorithm were used to perform clustering on the similarity matrices. Results were illustrated by using CLUSION visualization toolkit.