Fredrick Nthurima, Abraham Mutua, Waithaka Stephen Titus
{"title":"Detecting Phishing Emails Using Random Forest and AdaBoost Classifier Model","authors":"Fredrick Nthurima, Abraham Mutua, Waithaka Stephen Titus","doi":"10.32591/coas.ojit.0602.03123n","DOIUrl":null,"url":null,"abstract":"Phishing attack occurs when a phishing email which is a legitimate-looking email, designed to lure the recipient into believing that it is a genuine email to open and click malicious links embedded into the email. This leads to user reveal sensitive information such as credit card number, usernames or passwords to the attacker thereby gaining entry into the compromised account. Online surveys have put phishing attack as the leading attack for web content mostly targeting financial institutions. According to a survey conducted by Ponemon Institute LLC 2017, the loss due to phishing attack is about $1.5 billion per year. This is a global threat to information security and it’s on the rise due to IoT (Internet of Things) and thus requires a better phishing detection mechanism to mitigate these loses and reputation injury. This research paper explores and reports the use of a combination of machine learning algorithms; Random Forest and AdaBoost and use of more phishing email features in improving the accuracy of phishing detection and prevention. This project will explore the existing phishing methods, investigate the effect of combining two machine learning algorithms to detect and prevent phishing attacks, design and develop a supervised classifier which can detect phishing and prevent phishing emails and test the model with existing data. A dataset consisting of both benign and phishing emails will be used to conduct a supervised learning by the model. Expected accuracy is 99.9%, False Negative (FN) and False Positive (FP) rates of 0.1% and below.","PeriodicalId":210545,"journal":{"name":"Open Journal for Information Technology","volume":"26 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Journal for Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32591/coas.ojit.0602.03123n","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Phishing attack occurs when a phishing email which is a legitimate-looking email, designed to lure the recipient into believing that it is a genuine email to open and click malicious links embedded into the email. This leads to user reveal sensitive information such as credit card number, usernames or passwords to the attacker thereby gaining entry into the compromised account. Online surveys have put phishing attack as the leading attack for web content mostly targeting financial institutions. According to a survey conducted by Ponemon Institute LLC 2017, the loss due to phishing attack is about $1.5 billion per year. This is a global threat to information security and it’s on the rise due to IoT (Internet of Things) and thus requires a better phishing detection mechanism to mitigate these loses and reputation injury. This research paper explores and reports the use of a combination of machine learning algorithms; Random Forest and AdaBoost and use of more phishing email features in improving the accuracy of phishing detection and prevention. This project will explore the existing phishing methods, investigate the effect of combining two machine learning algorithms to detect and prevent phishing attacks, design and develop a supervised classifier which can detect phishing and prevent phishing emails and test the model with existing data. A dataset consisting of both benign and phishing emails will be used to conduct a supervised learning by the model. Expected accuracy is 99.9%, False Negative (FN) and False Positive (FP) rates of 0.1% and below.
网络钓鱼攻击是指一份看似合法的电子邮件,旨在诱使收件人相信这是一封真正的电子邮件,从而打开并点击嵌入电子邮件中的恶意链接。这会导致用户向攻击者泄露敏感信息,如信用卡号、用户名或密码,从而进入受损帐户。在线调查显示,网络钓鱼攻击是针对网络内容的主要攻击,主要针对金融机构。根据Ponemon Institute LLC 2017年进行的一项调查,网络钓鱼攻击每年造成的损失约为15亿美元。这是对信息安全的全球性威胁,并且由于物联网(IoT)而呈上升趋势,因此需要更好的网络钓鱼检测机制来减轻这些损失和声誉损害。本研究论文探索并报告了机器学习算法组合的使用;随机森林和AdaBoost以及使用更多的网络钓鱼电子邮件功能,以提高网络钓鱼检测和预防的准确性。本项目将探索现有的网络钓鱼方法,研究结合两种机器学习算法检测和预防网络钓鱼攻击的效果,设计和开发一个可以检测网络钓鱼和预防网络钓鱼邮件的监督分类器,并使用现有数据对模型进行测试。由良性和钓鱼电子邮件组成的数据集将用于模型进行监督学习。预期准确率为99.9%,假阴性(FN)和假阳性(FP)率为0.1%及以下。