W. Wahyono, I. N. P. Trisna, Sarah Lintang Sariwening, M. Fajar, Danur Wijayanto
{"title":"文本数据分类中k近邻距离度量的比较","authors":"W. Wahyono, I. N. P. Trisna, Sarah Lintang Sariwening, M. Fajar, Danur Wijayanto","doi":"10.14710/jtsiskom.8.1.2020.54-58","DOIUrl":null,"url":null,"abstract":"One algorithm to classify textual data in automatic organizing of documents application is KNN, by changing word representations into vectors. The distance calculation in the KNN algorithm becomes essential in measuring the closeness between data elements. This study compares four distance calculations commonly used in KNN, namely Euclidean, Chebyshev, Manhattan, and Minkowski. The dataset used data from Youtube Eminem’s comments which contain 448 data. This study showed that Euclidian or Minkowski on the KNN algorithm achieved the best result compared to Chebycev and Manhattan. The best results on KNN are obtained when the K value is 3.","PeriodicalId":56231,"journal":{"name":"Jurnal Teknologi dan Sistem Komputer","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparison of distance measurement on k-nearest neighbour in textual data classification\",\"authors\":\"W. Wahyono, I. N. P. Trisna, Sarah Lintang Sariwening, M. Fajar, Danur Wijayanto\",\"doi\":\"10.14710/jtsiskom.8.1.2020.54-58\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One algorithm to classify textual data in automatic organizing of documents application is KNN, by changing word representations into vectors. The distance calculation in the KNN algorithm becomes essential in measuring the closeness between data elements. This study compares four distance calculations commonly used in KNN, namely Euclidean, Chebyshev, Manhattan, and Minkowski. The dataset used data from Youtube Eminem’s comments which contain 448 data. This study showed that Euclidian or Minkowski on the KNN algorithm achieved the best result compared to Chebycev and Manhattan. The best results on KNN are obtained when the K value is 3.\",\"PeriodicalId\":56231,\"journal\":{\"name\":\"Jurnal Teknologi dan Sistem Komputer\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Teknologi dan Sistem Komputer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14710/jtsiskom.8.1.2020.54-58\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknologi dan Sistem Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14710/jtsiskom.8.1.2020.54-58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of distance measurement on k-nearest neighbour in textual data classification
One algorithm to classify textual data in automatic organizing of documents application is KNN, by changing word representations into vectors. The distance calculation in the KNN algorithm becomes essential in measuring the closeness between data elements. This study compares four distance calculations commonly used in KNN, namely Euclidean, Chebyshev, Manhattan, and Minkowski. The dataset used data from Youtube Eminem’s comments which contain 448 data. This study showed that Euclidian or Minkowski on the KNN algorithm achieved the best result compared to Chebycev and Manhattan. The best results on KNN are obtained when the K value is 3.