基于快速挖掘工具的大型连续随机数据集分类算法

2015 2nd International Conference on Electronics and Communication Systems (ICECS) Pub Date : 2015-06-18 DOI:10.1109/ECS.2015.7125003

Pooja Sharma, Divakar Singh, Anju Singh

{"title":"基于快速挖掘工具的大型连续随机数据集分类算法","authors":"Pooja Sharma, Divakar Singh, Anju Singh","doi":"10.1109/ECS.2015.7125003","DOIUrl":null,"url":null,"abstract":"Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.","PeriodicalId":202856,"journal":{"name":"2015 2nd International Conference on Electronics and Communication Systems (ICECS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Classification algorithms on a large continuous random dataset using rapid miner tool\",\"authors\":\"Pooja Sharma, Divakar Singh, Anju Singh\",\"doi\":\"10.1109/ECS.2015.7125003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.\",\"PeriodicalId\":202856,\"journal\":{\"name\":\"2015 2nd International Conference on Electronics and Communication Systems (ICECS)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 2nd International Conference on Electronics and Communication Systems (ICECS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECS.2015.7125003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Electronics and Communication Systems (ICECS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECS.2015.7125003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

分类是数据挖掘领域中广泛使用的技术，可扩展性和效率是大型数据库分类算法面临的紧迫问题。现在每天都会产生大量的数据，这些数据需要分析，并且必须从中提取模式以获得一些知识。分类是一项有监督的机器学习任务，它从标记的训练数据中构建模型。该模型用于确定类别;有许多类型的分类算法，如基于树的算法(C4.5决策树，j48决策树等)，朴素贝叶斯等等。这些分类算法各有优缺点，这取决于数据的特性等诸多因素。我们可以通过使用测试数据的准确度、精密度、分类误差和kappa等几个指标来衡量分类性能。我们在快速挖掘工具中使用随机数据集进行分类。分层抽样用于不同的分类器，如J48, C4.5和naïve贝叶斯。我们使用随机生成的数据集和不使用随机数据集来分析分类器的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classification algorithms on a large continuous random dataset using rapid miner tool

Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 2nd International Conference on Electronics and Communication Systems (ICECS)

自引率

0.00%

发文量