{"title":"Clustering Alexa Internet Data using Auto Encoder Network and Affinity Propagation","authors":"Ali Risheh, Ashkan Goharfar, N. T. Javan","doi":"10.1109/ICCKE50421.2020.9303705","DOIUrl":null,"url":null,"abstract":"Non-linear mapping is one of the most popular solutions for complex data structures and distinct patterns to cluster data. Auto encoder Networks (AENs) are widely used in clustering as they improve data representation. In this paper, we collect Alexa.com data by crawling popular websites profiles, where dataset has 84 columns with type number and array of words. Next, an AEN architecture is presented to identify specific websites with exceptional patterns and the encoded data expresses new feature space of our original data. (Our) Encoded data is clustered by Affinity Propagation which is a partitioning algorithm without the need for specifying the number of clusters. There are important results based on 194 clusters and exemplars which are filtered and analyzed. One remarkable fact about results is that the first 11 columns as raw data are not clustered by Affinity Propagation a which re considered all as outlier. The results are summarized by selecting the best option w.r.t. the statistics and charts. Some of the obtained results are useful for website owners and provide some suggestions and solutions for Search Engine Optimization (SEO). Finally, we propose our crawler application which crawls and records data over 4 days. It must be added that the proposed web crawler faces challenges and their solutions can be helpful in most commonly used web crawling algorithms and libraries.","PeriodicalId":402043,"journal":{"name":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE50421.2020.9303705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Non-linear mapping is one of the most popular solutions for complex data structures and distinct patterns to cluster data. Auto encoder Networks (AENs) are widely used in clustering as they improve data representation. In this paper, we collect Alexa.com data by crawling popular websites profiles, where dataset has 84 columns with type number and array of words. Next, an AEN architecture is presented to identify specific websites with exceptional patterns and the encoded data expresses new feature space of our original data. (Our) Encoded data is clustered by Affinity Propagation which is a partitioning algorithm without the need for specifying the number of clusters. There are important results based on 194 clusters and exemplars which are filtered and analyzed. One remarkable fact about results is that the first 11 columns as raw data are not clustered by Affinity Propagation a which re considered all as outlier. The results are summarized by selecting the best option w.r.t. the statistics and charts. Some of the obtained results are useful for website owners and provide some suggestions and solutions for Search Engine Optimization (SEO). Finally, we propose our crawler application which crawls and records data over 4 days. It must be added that the proposed web crawler faces challenges and their solutions can be helpful in most commonly used web crawling algorithms and libraries.