Arka Ghosh, Raja Das, Shreyashi Dey, Gautam Mahapatra
{"title":"集成学习及其在垃圾邮件检测中的应用","authors":"Arka Ghosh, Raja Das, Shreyashi Dey, Gautam Mahapatra","doi":"10.1109/ICCECE51049.2023.10085378","DOIUrl":null,"url":null,"abstract":"An individual model is not always sufficient enough to classify an email. Each spam mail has features that distinguish it from any other regular mail. A model might not always use that feature for classification and thus produce erroneous results. It is essential to cross-verify the output of one model, with that of another model. This can be done using the ensemble learning technique. Previously, this was done using the same model repeatedly, or different variants of the model. However, in this paper, we have used four completely different models and used them to perform max voting, to optimize the result. The models used are Support Vector Machine(SVM), Multinomial Naïve Bayes(MNB), Random Forest(RF), and Decision Tree(DT). After testing all the possible combinations, we were able to conclude that the combination of SVM, MNB, and DT gives the optimal result.","PeriodicalId":447131,"journal":{"name":"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ensemble Learning And its Application in Spam Detection\",\"authors\":\"Arka Ghosh, Raja Das, Shreyashi Dey, Gautam Mahapatra\",\"doi\":\"10.1109/ICCECE51049.2023.10085378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An individual model is not always sufficient enough to classify an email. Each spam mail has features that distinguish it from any other regular mail. A model might not always use that feature for classification and thus produce erroneous results. It is essential to cross-verify the output of one model, with that of another model. This can be done using the ensemble learning technique. Previously, this was done using the same model repeatedly, or different variants of the model. However, in this paper, we have used four completely different models and used them to perform max voting, to optimize the result. The models used are Support Vector Machine(SVM), Multinomial Naïve Bayes(MNB), Random Forest(RF), and Decision Tree(DT). After testing all the possible combinations, we were able to conclude that the combination of SVM, MNB, and DT gives the optimal result.\",\"PeriodicalId\":447131,\"journal\":{\"name\":\"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE51049.2023.10085378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE51049.2023.10085378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ensemble Learning And its Application in Spam Detection
An individual model is not always sufficient enough to classify an email. Each spam mail has features that distinguish it from any other regular mail. A model might not always use that feature for classification and thus produce erroneous results. It is essential to cross-verify the output of one model, with that of another model. This can be done using the ensemble learning technique. Previously, this was done using the same model repeatedly, or different variants of the model. However, in this paper, we have used four completely different models and used them to perform max voting, to optimize the result. The models used are Support Vector Machine(SVM), Multinomial Naïve Bayes(MNB), Random Forest(RF), and Decision Tree(DT). After testing all the possible combinations, we were able to conclude that the combination of SVM, MNB, and DT gives the optimal result.