基于集合Bagging和集合投票的多类电子邮件分类

Ali Helmut, D. Murdiansyah
{"title":"基于集合Bagging和集合投票的多类电子邮件分类","authors":"Ali Helmut, D. Murdiansyah","doi":"10.33387/jiko.v6i2.6394","DOIUrl":null,"url":null,"abstract":"Email is a common communication technology in modern life. The more emails we receive, the more difficult and time consuming it is to sort them out. One solution to overcome this problem is to create a system using machine learning to sort emails. Each method of machine learning and data sampling result in different performance. Ensemble learning is a method of combining several learning models into one model to get better performance. In this study we tried to create a multiclass email classification system by combining learning models, data sampling, and several data classes to obtain the effect of Ensemble Bagging and Ensemble Voting methods on the performance of the macro average f1 score, and compare it with non-ensemble models. The results of this study show that the sensitivity of Naïve Bayes to imbalance data is helped by the Ensemble Bagging and Ensemble Voting method with ∆P (delta performance) of range 0.0001 – 0.0018. Logistic Regression has performance with Ensemble Bagging and Ensemble Voting by ∆P of range 0.0001-0.00015. Decision Tree has lowest performance compared to others with ∆P of -0.01","PeriodicalId":243297,"journal":{"name":"JIKO (Jurnal Informatika dan Komputer)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiclass Email Classification by Using Ensemble Bagging and Ensemble Voting\",\"authors\":\"Ali Helmut, D. Murdiansyah\",\"doi\":\"10.33387/jiko.v6i2.6394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Email is a common communication technology in modern life. The more emails we receive, the more difficult and time consuming it is to sort them out. One solution to overcome this problem is to create a system using machine learning to sort emails. Each method of machine learning and data sampling result in different performance. Ensemble learning is a method of combining several learning models into one model to get better performance. In this study we tried to create a multiclass email classification system by combining learning models, data sampling, and several data classes to obtain the effect of Ensemble Bagging and Ensemble Voting methods on the performance of the macro average f1 score, and compare it with non-ensemble models. The results of this study show that the sensitivity of Naïve Bayes to imbalance data is helped by the Ensemble Bagging and Ensemble Voting method with ∆P (delta performance) of range 0.0001 – 0.0018. Logistic Regression has performance with Ensemble Bagging and Ensemble Voting by ∆P of range 0.0001-0.00015. Decision Tree has lowest performance compared to others with ∆P of -0.01\",\"PeriodicalId\":243297,\"journal\":{\"name\":\"JIKO (Jurnal Informatika dan Komputer)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JIKO (Jurnal Informatika dan Komputer)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33387/jiko.v6i2.6394\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JIKO (Jurnal Informatika dan Komputer)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33387/jiko.v6i2.6394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

电子邮件是现代生活中常见的通信技术。我们收到的电子邮件越多,整理它们就越困难,也越耗时。克服这个问题的一个解决方案是创建一个使用机器学习来分类电子邮件的系统。不同的机器学习和数据采样方法会产生不同的性能。集成学习是一种将多个学习模型组合成一个模型以获得更好性能的方法。在本研究中,我们尝试通过结合学习模型、数据采样和多个数据类来创建一个多类电子邮件分类系统,以获得Ensemble Bagging和Ensemble Voting方法对宏观平均f1得分的影响,并将其与非集成模型进行比较。本研究结果表明,Naïve贝叶斯对不平衡数据的敏感性得到了Ensemble Bagging和Ensemble Voting方法的帮助,∆P (δ性能)范围为0.0001 - 0.0018。Logistic回归在0.0001-0.00015区间的∆P下具有Ensemble Bagging和Ensemble Voting的性能。决策树的表现最差,∆P为-0.01
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multiclass Email Classification by Using Ensemble Bagging and Ensemble Voting
Email is a common communication technology in modern life. The more emails we receive, the more difficult and time consuming it is to sort them out. One solution to overcome this problem is to create a system using machine learning to sort emails. Each method of machine learning and data sampling result in different performance. Ensemble learning is a method of combining several learning models into one model to get better performance. In this study we tried to create a multiclass email classification system by combining learning models, data sampling, and several data classes to obtain the effect of Ensemble Bagging and Ensemble Voting methods on the performance of the macro average f1 score, and compare it with non-ensemble models. The results of this study show that the sensitivity of Naïve Bayes to imbalance data is helped by the Ensemble Bagging and Ensemble Voting method with ∆P (delta performance) of range 0.0001 – 0.0018. Logistic Regression has performance with Ensemble Bagging and Ensemble Voting by ∆P of range 0.0001-0.00015. Decision Tree has lowest performance compared to others with ∆P of -0.01
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信