{"title":"CW-kNN:一种高效的基于knn的不平衡数据集分类模型","authors":"Yi Xiang, Zhong Cao, Shaowen Yao, Jing He","doi":"10.1145/3290420.3290431","DOIUrl":null,"url":null,"abstract":"K nearest neighbor (kNN) method is a popular classification method in data mining because of its simple implementation and significant classification performance. However, kNN do not scale well to big datasets. In this paper, CLUKER, a novel kNN regression method based on hierarchical clustering, is proposed. CLUKER uses hierarchical clustering to divide the original dataset into several parts, effectively reducing the query scope of kNN. Moreover, in order to improve kNN's ability to handle imbalanced datasets, this paper proposes a novel weighting method based on local data distribution, called LD-Weighting method. In the end, having integrated the two algorithms, this paper proposes an efficient kNN-based model for imbalanced dataset classification called CW-kNN. The experimental results show that the proposed methods perform well on different datasets.","PeriodicalId":259201,"journal":{"name":"International Conference on Critical Infrastructure Protection","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"CW-kNN: an efficient kNN-based model for imbalanced dataset classification\",\"authors\":\"Yi Xiang, Zhong Cao, Shaowen Yao, Jing He\",\"doi\":\"10.1145/3290420.3290431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K nearest neighbor (kNN) method is a popular classification method in data mining because of its simple implementation and significant classification performance. However, kNN do not scale well to big datasets. In this paper, CLUKER, a novel kNN regression method based on hierarchical clustering, is proposed. CLUKER uses hierarchical clustering to divide the original dataset into several parts, effectively reducing the query scope of kNN. Moreover, in order to improve kNN's ability to handle imbalanced datasets, this paper proposes a novel weighting method based on local data distribution, called LD-Weighting method. In the end, having integrated the two algorithms, this paper proposes an efficient kNN-based model for imbalanced dataset classification called CW-kNN. The experimental results show that the proposed methods perform well on different datasets.\",\"PeriodicalId\":259201,\"journal\":{\"name\":\"International Conference on Critical Infrastructure Protection\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Critical Infrastructure Protection\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3290420.3290431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Critical Infrastructure Protection","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3290420.3290431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CW-kNN: an efficient kNN-based model for imbalanced dataset classification
K nearest neighbor (kNN) method is a popular classification method in data mining because of its simple implementation and significant classification performance. However, kNN do not scale well to big datasets. In this paper, CLUKER, a novel kNN regression method based on hierarchical clustering, is proposed. CLUKER uses hierarchical clustering to divide the original dataset into several parts, effectively reducing the query scope of kNN. Moreover, in order to improve kNN's ability to handle imbalanced datasets, this paper proposes a novel weighting method based on local data distribution, called LD-Weighting method. In the end, having integrated the two algorithms, this paper proposes an efficient kNN-based model for imbalanced dataset classification called CW-kNN. The experimental results show that the proposed methods perform well on different datasets.