基于快速挖掘工具的大型连续随机数据集分类算法

Pooja Sharma, Divakar Singh, Anju Singh
{"title":"基于快速挖掘工具的大型连续随机数据集分类算法","authors":"Pooja Sharma, Divakar Singh, Anju Singh","doi":"10.1109/ECS.2015.7125003","DOIUrl":null,"url":null,"abstract":"Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.","PeriodicalId":202856,"journal":{"name":"2015 2nd International Conference on Electronics and Communication Systems (ICECS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Classification algorithms on a large continuous random dataset using rapid miner tool\",\"authors\":\"Pooja Sharma, Divakar Singh, Anju Singh\",\"doi\":\"10.1109/ECS.2015.7125003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.\",\"PeriodicalId\":202856,\"journal\":{\"name\":\"2015 2nd International Conference on Electronics and Communication Systems (ICECS)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 2nd International Conference on Electronics and Communication Systems (ICECS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECS.2015.7125003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Electronics and Communication Systems (ICECS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECS.2015.7125003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

分类是数据挖掘领域中广泛使用的技术,可扩展性和效率是大型数据库分类算法面临的紧迫问题。现在每天都会产生大量的数据,这些数据需要分析,并且必须从中提取模式以获得一些知识。分类是一项有监督的机器学习任务,它从标记的训练数据中构建模型。该模型用于确定类别;有许多类型的分类算法,如基于树的算法(C4.5决策树,j48决策树等),朴素贝叶斯等等。这些分类算法各有优缺点,这取决于数据的特性等诸多因素。我们可以通过使用测试数据的准确度、精密度、分类误差和kappa等几个指标来衡量分类性能。我们在快速挖掘工具中使用随机数据集进行分类。分层抽样用于不同的分类器,如J48, C4.5和naïve贝叶斯。我们使用随机生成的数据集和不使用随机数据集来分析分类器的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Classification algorithms on a large continuous random dataset using rapid miner tool
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信