{"title":"为有限大小的数据集识别合适分类器的系统方法","authors":"Alanoud Bin Dris, Najla Alzakari, H. Kurdi","doi":"10.1109/ISNCC.2019.8909099","DOIUrl":null,"url":null,"abstract":"Data size is a main issue in any data mining application, since limited size data results in a small training set that leads to a poor classification model and therefore a poor classification performance. Although, many real-life applications need a classifier that deals with limited size data sets appropriately. A considerable interest is focused on how to achieve a reasonable classification performance for small data sets. Current works focus on either enhancing classification algorithms or enlarging the data sets, these solutions have limitations such as increasing the computational time, or reaching data sets that do not reflect the actual population of the real data. However, this research looks at the problem from a different angel, it aims to address the data quantity issue by identifying the most appropriate classifier for small data sets using three well-known classifiers which are Decision tree (J48), Support Vector Machine (SVM) and Naïve Bayes. Extensive experiments are conducted to examine the performance in terms of four different measures which are accuracy, f-measure, sensitivity and specificity. We used six small data sets from UCI repository with different attributes and instances sizes. Results revealed that SVM accomplished the best performance along most of the used data sets.","PeriodicalId":187178,"journal":{"name":"2019 International Symposium on Networks, Computers and Communications (ISNCC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Systematic Approach to Identify an Appropriate Classifier for Limited-Sized Data Sets\",\"authors\":\"Alanoud Bin Dris, Najla Alzakari, H. Kurdi\",\"doi\":\"10.1109/ISNCC.2019.8909099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data size is a main issue in any data mining application, since limited size data results in a small training set that leads to a poor classification model and therefore a poor classification performance. Although, many real-life applications need a classifier that deals with limited size data sets appropriately. A considerable interest is focused on how to achieve a reasonable classification performance for small data sets. Current works focus on either enhancing classification algorithms or enlarging the data sets, these solutions have limitations such as increasing the computational time, or reaching data sets that do not reflect the actual population of the real data. However, this research looks at the problem from a different angel, it aims to address the data quantity issue by identifying the most appropriate classifier for small data sets using three well-known classifiers which are Decision tree (J48), Support Vector Machine (SVM) and Naïve Bayes. Extensive experiments are conducted to examine the performance in terms of four different measures which are accuracy, f-measure, sensitivity and specificity. We used six small data sets from UCI repository with different attributes and instances sizes. Results revealed that SVM accomplished the best performance along most of the used data sets.\",\"PeriodicalId\":187178,\"journal\":{\"name\":\"2019 International Symposium on Networks, Computers and Communications (ISNCC)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Symposium on Networks, Computers and Communications (ISNCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISNCC.2019.8909099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Symposium on Networks, Computers and Communications (ISNCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNCC.2019.8909099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Systematic Approach to Identify an Appropriate Classifier for Limited-Sized Data Sets
Data size is a main issue in any data mining application, since limited size data results in a small training set that leads to a poor classification model and therefore a poor classification performance. Although, many real-life applications need a classifier that deals with limited size data sets appropriately. A considerable interest is focused on how to achieve a reasonable classification performance for small data sets. Current works focus on either enhancing classification algorithms or enlarging the data sets, these solutions have limitations such as increasing the computational time, or reaching data sets that do not reflect the actual population of the real data. However, this research looks at the problem from a different angel, it aims to address the data quantity issue by identifying the most appropriate classifier for small data sets using three well-known classifiers which are Decision tree (J48), Support Vector Machine (SVM) and Naïve Bayes. Extensive experiments are conducted to examine the performance in terms of four different measures which are accuracy, f-measure, sensitivity and specificity. We used six small data sets from UCI repository with different attributes and instances sizes. Results revealed that SVM accomplished the best performance along most of the used data sets.