{"title":"基于快速挖掘工具的大型连续随机数据集分类算法","authors":"Pooja Sharma, Divakar Singh, Anju Singh","doi":"10.1109/ECS.2015.7125003","DOIUrl":null,"url":null,"abstract":"Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.","PeriodicalId":202856,"journal":{"name":"2015 2nd International Conference on Electronics and Communication Systems (ICECS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Classification algorithms on a large continuous random dataset using rapid miner tool\",\"authors\":\"Pooja Sharma, Divakar Singh, Anju Singh\",\"doi\":\"10.1109/ECS.2015.7125003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.\",\"PeriodicalId\":202856,\"journal\":{\"name\":\"2015 2nd International Conference on Electronics and Communication Systems (ICECS)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 2nd International Conference on Electronics and Communication Systems (ICECS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECS.2015.7125003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Electronics and Communication Systems (ICECS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECS.2015.7125003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classification algorithms on a large continuous random dataset using rapid miner tool
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. Now a day's large amount of data is generated, that need to be analyse, and pattern have to be extracted from that to get some knowledge. Classification is a supervised machine learning task which builds a model from labelled training data. The model is used for determining the class; there are many types of classification algorithms such as tree-based algorithms (C4.5 decision tree, j48 decision tree etc.), naive Bayes and many more. These classification algorithms have their own pros and cons, depending on many factors such as the characteristics of the data. We can measure the classification performance by using several metrics, such as accuracy, precision, classification error and kappa on the testing data. We have used a random dataset in a rapid miner tool for the classification. Stratified sampling is used in different classifier such as J48, C4.5 and naïve Bayes. We analysed the result of the classifier using the randomly generated dataset and without random dataset.