{"title":"Grouping instances in kNN for classification based on computer mouse features","authors":"D. Chudá, Peter Krátky","doi":"10.1145/2812428.2812454","DOIUrl":null,"url":null,"abstract":"Computer mouse usage features could be used to distinguish web page visitors. Particular data instances representing user's navigation actions are insufficient when used separately to perform classification with basic k-nearest neighbors (kNN) classifier. We propose a modification of kNN method in which instances of the same class form groups. Finding the nearest neighbors is based on measuring distance between histograms representing distributions of values for the corresponding groups. The paper provides a series of experiments on dataset from 100 web visitors. It describes comparison of several distance metrics as well as different levels of grouping. Combination of non-parametric tests statistics for measuring distance and suitable size of groups improves classification success rate significantly.","PeriodicalId":316788,"journal":{"name":"International Conference on Computer Systems and Technologies","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2812428.2812454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Computer mouse usage features could be used to distinguish web page visitors. Particular data instances representing user's navigation actions are insufficient when used separately to perform classification with basic k-nearest neighbors (kNN) classifier. We propose a modification of kNN method in which instances of the same class form groups. Finding the nearest neighbors is based on measuring distance between histograms representing distributions of values for the corresponding groups. The paper provides a series of experiments on dataset from 100 web visitors. It describes comparison of several distance metrics as well as different levels of grouping. Combination of non-parametric tests statistics for measuring distance and suitable size of groups improves classification success rate significantly.