Muhammad Misdram, E. Noersasongko, A. Syukur, Purwanto Faculty, Muljono Muljono, Heru Agus Santoso, De Rosal Ignatius Moses Setiadi
{"title":"基于Naïve贝叶斯和粒子群优化的小数据集和不平衡数据集分类方法分析","authors":"Muhammad Misdram, E. Noersasongko, A. Syukur, Purwanto Faculty, Muljono Muljono, Heru Agus Santoso, De Rosal Ignatius Moses Setiadi","doi":"10.1109/iSemantic50169.2020.9234225","DOIUrl":null,"url":null,"abstract":"The classification method in data mining requires a good learning process to get optimal accuracy. This can be done if the dataset used is ideal, balanced, and has a lot of records, but in reality, it is difficult to get such a dataset. The imputation method is one way to fill in missing values, in a dataset that is not ideal. A large number of missing values can reduce the number of records in the learning process and affect accuracy. This research aims to analyze the effects of zero and mean imputation methods on classification accuracy in small datasets using the Naïve Bayes classifier (NBC) and NBC which have been optimized with Particle Swarm Optimization (PSO). Tests were carried out on five types of datasets originating from the UCI database, where one of the datasets did not require an imputation method because it did not have a missing value. Based on the results of the PSO testing proven to be able to improve the accuracy of the NBC classification on all datasets. While the imputation method can improve classification accuracy up to 4.33% in Biomarker datasets.","PeriodicalId":345558,"journal":{"name":"2020 International Seminar on Application for Technology of Information and Communication (iSemantic)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Analysis of Imputation Methods of Small and Unbalanced Datasets in Classifications using Naïve Bayes and Particle Swarm Optimization\",\"authors\":\"Muhammad Misdram, E. Noersasongko, A. Syukur, Purwanto Faculty, Muljono Muljono, Heru Agus Santoso, De Rosal Ignatius Moses Setiadi\",\"doi\":\"10.1109/iSemantic50169.2020.9234225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The classification method in data mining requires a good learning process to get optimal accuracy. This can be done if the dataset used is ideal, balanced, and has a lot of records, but in reality, it is difficult to get such a dataset. The imputation method is one way to fill in missing values, in a dataset that is not ideal. A large number of missing values can reduce the number of records in the learning process and affect accuracy. This research aims to analyze the effects of zero and mean imputation methods on classification accuracy in small datasets using the Naïve Bayes classifier (NBC) and NBC which have been optimized with Particle Swarm Optimization (PSO). Tests were carried out on five types of datasets originating from the UCI database, where one of the datasets did not require an imputation method because it did not have a missing value. Based on the results of the PSO testing proven to be able to improve the accuracy of the NBC classification on all datasets. While the imputation method can improve classification accuracy up to 4.33% in Biomarker datasets.\",\"PeriodicalId\":345558,\"journal\":{\"name\":\"2020 International Seminar on Application for Technology of Information and Communication (iSemantic)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Seminar on Application for Technology of Information and Communication (iSemantic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSemantic50169.2020.9234225\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Seminar on Application for Technology of Information and Communication (iSemantic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSemantic50169.2020.9234225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of Imputation Methods of Small and Unbalanced Datasets in Classifications using Naïve Bayes and Particle Swarm Optimization
The classification method in data mining requires a good learning process to get optimal accuracy. This can be done if the dataset used is ideal, balanced, and has a lot of records, but in reality, it is difficult to get such a dataset. The imputation method is one way to fill in missing values, in a dataset that is not ideal. A large number of missing values can reduce the number of records in the learning process and affect accuracy. This research aims to analyze the effects of zero and mean imputation methods on classification accuracy in small datasets using the Naïve Bayes classifier (NBC) and NBC which have been optimized with Particle Swarm Optimization (PSO). Tests were carried out on five types of datasets originating from the UCI database, where one of the datasets did not require an imputation method because it did not have a missing value. Based on the results of the PSO testing proven to be able to improve the accuracy of the NBC classification on all datasets. While the imputation method can improve classification accuracy up to 4.33% in Biomarker datasets.