Firda A. Setiawati, Q. U. Safitri, A. Huda, Aep Saepulloh, W. Darmalaksana
{"title":"基于k-媒质算法的英语圣训翻译分类特征选择","authors":"Firda A. Setiawati, Q. U. Safitri, A. Huda, Aep Saepulloh, W. Darmalaksana","doi":"10.1109/ICWT47785.2019.8978221","DOIUrl":null,"url":null,"abstract":"The problem of document classification is the number of features that are very large. the number of features depends on the number of terms or vocabulary used. Obviously, for every document, it contains only a small number of words in a vocabulary. So that will cause the number of elements zero. Therefore, a method is proposed to select some features that can represent all features. the method used is to cluster the vocabulary. representatives of each cluster of clustered results are used as a feature for each document in the categorization process. the categorization process is done by the k-Neirest Neighbor (k-NN) and Nearest Centroid (NC) algorithms. The data used is the translation of English hadith. with this method, it is expected that computation time will be faster and categorization result will be better (accuracy more precise).","PeriodicalId":220618,"journal":{"name":"2019 IEEE 5th International Conference on Wireless and Telematics (ICWT)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Feature Selection using k-Medoid Algorithm for Categorization of Hadith Translation in English\",\"authors\":\"Firda A. Setiawati, Q. U. Safitri, A. Huda, Aep Saepulloh, W. Darmalaksana\",\"doi\":\"10.1109/ICWT47785.2019.8978221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of document classification is the number of features that are very large. the number of features depends on the number of terms or vocabulary used. Obviously, for every document, it contains only a small number of words in a vocabulary. So that will cause the number of elements zero. Therefore, a method is proposed to select some features that can represent all features. the method used is to cluster the vocabulary. representatives of each cluster of clustered results are used as a feature for each document in the categorization process. the categorization process is done by the k-Neirest Neighbor (k-NN) and Nearest Centroid (NC) algorithms. The data used is the translation of English hadith. with this method, it is expected that computation time will be faster and categorization result will be better (accuracy more precise).\",\"PeriodicalId\":220618,\"journal\":{\"name\":\"2019 IEEE 5th International Conference on Wireless and Telematics (ICWT)\",\"volume\":\"149 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 5th International Conference on Wireless and Telematics (ICWT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWT47785.2019.8978221\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Wireless and Telematics (ICWT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWT47785.2019.8978221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature Selection using k-Medoid Algorithm for Categorization of Hadith Translation in English
The problem of document classification is the number of features that are very large. the number of features depends on the number of terms or vocabulary used. Obviously, for every document, it contains only a small number of words in a vocabulary. So that will cause the number of elements zero. Therefore, a method is proposed to select some features that can represent all features. the method used is to cluster the vocabulary. representatives of each cluster of clustered results are used as a feature for each document in the categorization process. the categorization process is done by the k-Neirest Neighbor (k-NN) and Nearest Centroid (NC) algorithms. The data used is the translation of English hadith. with this method, it is expected that computation time will be faster and categorization result will be better (accuracy more precise).