机器学习分类技术选择研究

2015 International Conference on Electrical Engineering and Informatics (ICEEI) Pub Date : 2015-08-01 DOI:10.1109/ICEEI.2015.7352559

R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima

{"title":"机器学习分类技术选择研究","authors":"R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima","doi":"10.1109/ICEEI.2015.7352559","DOIUrl":null,"url":null,"abstract":"Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.","PeriodicalId":426454,"journal":{"name":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"On machine learning technique selection for classification\",\"authors\":\"R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima\",\"doi\":\"10.1109/ICEEI.2015.7352559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.\",\"PeriodicalId\":426454,\"journal\":{\"name\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEI.2015.7352559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEI.2015.7352559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

从数据中提取有意义的模式可能具有挑战性。不相关、冗余、嘈杂和不可靠的数据、对结果的误解以及从数据中提取未知模式的技术的不兼容性可能导致分析人员开发错误的分类器。这项研究受到“天下没有免费的午餐”定理的鼓舞，该定理可以简化为没有对每个问题都最有效的分类技术。本研究试图比较数据挖掘的三种主要方法，即决策树(DT)、人工神经网络(ANN)和粗糙集理论(RST)。通过使用开源软件ROSETTA和WEKA在五个不同的数据集上对上述技术进行了比较分析。样本大小根据数据集中可用的属性数量和实例数量进行分类。对分类模型的评估基于所生成规则的准确性、数量和长度、错误率和标准差。通过9个实验，结果表明，人工神经网络比决策树和粗糙集方法具有更好的准确率，而粗糙集方法生成的规则更多，决策树生成规则的速度更快。结果表明，在为特定问题寻找最佳模型时，其他研究人员使用不同的方法进行了权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On machine learning technique selection for classification

Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Electrical Engineering and Informatics (ICEEI)

自引率

0.00%

发文量