R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima
{"title":"机器学习分类技术选择研究","authors":"R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima","doi":"10.1109/ICEEI.2015.7352559","DOIUrl":null,"url":null,"abstract":"Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.","PeriodicalId":426454,"journal":{"name":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"On machine learning technique selection for classification\",\"authors\":\"R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima\",\"doi\":\"10.1109/ICEEI.2015.7352559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.\",\"PeriodicalId\":426454,\"journal\":{\"name\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEI.2015.7352559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEI.2015.7352559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On machine learning technique selection for classification
Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.