机器学习分类技术选择研究

R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima
{"title":"机器学习分类技术选择研究","authors":"R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima","doi":"10.1109/ICEEI.2015.7352559","DOIUrl":null,"url":null,"abstract":"Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.","PeriodicalId":426454,"journal":{"name":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"On machine learning technique selection for classification\",\"authors\":\"R. Kurniawan, Mohd. Zakree, Ahmad Nazri, M. Irsyad, Rado Yendra, Anis Aklima\",\"doi\":\"10.1109/ICEEI.2015.7352559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.\",\"PeriodicalId\":426454,\"journal\":{\"name\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Electrical Engineering and Informatics (ICEEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEI.2015.7352559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Electrical Engineering and Informatics (ICEEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEI.2015.7352559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

从数据中提取有意义的模式可能具有挑战性。不相关、冗余、嘈杂和不可靠的数据、对结果的误解以及从数据中提取未知模式的技术的不兼容性可能导致分析人员开发错误的分类器。这项研究受到“天下没有免费的午餐”定理的鼓舞,该定理可以简化为没有对每个问题都最有效的分类技术。本研究试图比较数据挖掘的三种主要方法,即决策树(DT)、人工神经网络(ANN)和粗糙集理论(RST)。通过使用开源软件ROSETTA和WEKA在五个不同的数据集上对上述技术进行了比较分析。样本大小根据数据集中可用的属性数量和实例数量进行分类。对分类模型的评估基于所生成规则的准确性、数量和长度、错误率和标准差。通过9个实验,结果表明,人工神经网络比决策树和粗糙集方法具有更好的准确率,而粗糙集方法生成的规则更多,决策树生成规则的速度更快。结果表明,在为特定问题寻找最佳模型时,其他研究人员使用不同的方法进行了权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On machine learning technique selection for classification
Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by `No Free Lunch' theorem that can be simplified as no classification technique that works best for every problem. This study tries to make a comparison amongst three main approaches in data mining, i.e. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RST). A comparative analysis of the above techniques has been conducted by using open source's software ROSETTA and WEKA on five different datasets. The sample sizes are categorized in relation to the number of attributes and number of instances available in the dataset. Assessments on the classification model are based on accuracy, amount and length of the generated rules, error rate and standard deviation. Based on nine experiments, results show that Artificial Neural Network provides better accuracy than Decision Tree and Rough Set approach while Rough Set creates more rules and Decision Tree generate rules faster than the compared techniques. The results show the trade off of using different approaches for other researchers in finding the best model for a particular problem.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信