Worasak Rueangsirarak, T. Laohapensaeng, S. Chansareewittaya, Anusorn Yodjaiphet
{"title":"余弦相似度技术去除冗余样本","authors":"Worasak Rueangsirarak, T. Laohapensaeng, S. Chansareewittaya, Anusorn Yodjaiphet","doi":"10.1109/WPMC48795.2019.9096106","DOIUrl":null,"url":null,"abstract":"The k-nearest neighbor algorithm is one of the basic and simple classification algorithms that share a common limitation of the algorithm which requires more computation cost when the size of training data is enlarged. To solve this problem, a new method applied to the cosine similarity for reducing the size of the training data set is proposed. This method reduces the data points that close to a decision boundary and retains the important points which affect classification accuracy. For the data far from the decision boundary and not affect the classification, these points will be removed from the training data set. The proposed method is evaluated its accuracy and reduction performance on the state of the art mechanisms, categorized as prototype selection algorithms. The 20 real-world data set are used to evaluate the proposed method. The experimental results are compared with 21 existing methods. As a result, our proposed method performs the best with 89.95% accuracy but has only a fair reduction ratio, when compared to other methods.","PeriodicalId":298927,"journal":{"name":"2019 22nd International Symposium on Wireless Personal Multimedia Communications (WPMC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The Cosine Similarity Technique for Removing the Redundancy Sample\",\"authors\":\"Worasak Rueangsirarak, T. Laohapensaeng, S. Chansareewittaya, Anusorn Yodjaiphet\",\"doi\":\"10.1109/WPMC48795.2019.9096106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The k-nearest neighbor algorithm is one of the basic and simple classification algorithms that share a common limitation of the algorithm which requires more computation cost when the size of training data is enlarged. To solve this problem, a new method applied to the cosine similarity for reducing the size of the training data set is proposed. This method reduces the data points that close to a decision boundary and retains the important points which affect classification accuracy. For the data far from the decision boundary and not affect the classification, these points will be removed from the training data set. The proposed method is evaluated its accuracy and reduction performance on the state of the art mechanisms, categorized as prototype selection algorithms. The 20 real-world data set are used to evaluate the proposed method. The experimental results are compared with 21 existing methods. As a result, our proposed method performs the best with 89.95% accuracy but has only a fair reduction ratio, when compared to other methods.\",\"PeriodicalId\":298927,\"journal\":{\"name\":\"2019 22nd International Symposium on Wireless Personal Multimedia Communications (WPMC)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22nd International Symposium on Wireless Personal Multimedia Communications (WPMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WPMC48795.2019.9096106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd International Symposium on Wireless Personal Multimedia Communications (WPMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WPMC48795.2019.9096106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Cosine Similarity Technique for Removing the Redundancy Sample
The k-nearest neighbor algorithm is one of the basic and simple classification algorithms that share a common limitation of the algorithm which requires more computation cost when the size of training data is enlarged. To solve this problem, a new method applied to the cosine similarity for reducing the size of the training data set is proposed. This method reduces the data points that close to a decision boundary and retains the important points which affect classification accuracy. For the data far from the decision boundary and not affect the classification, these points will be removed from the training data set. The proposed method is evaluated its accuracy and reduction performance on the state of the art mechanisms, categorized as prototype selection algorithms. The 20 real-world data set are used to evaluate the proposed method. The experimental results are compared with 21 existing methods. As a result, our proposed method performs the best with 89.95% accuracy but has only a fair reduction ratio, when compared to other methods.