Lambda Architecture-Based Big Data System for Large-Scale Targeted Social Engineering Email Detection

International Journal of Information Security Science Pub Date : 2023-09-30 DOI:10.55859/ijiss.1338813

Mustafa Umut DEMİREZEN, Tuğba SELCEN NAVRUZ

{"title":"Lambda Architecture-Based Big Data System for Large-Scale Targeted Social Engineering Email Detection","authors":"Mustafa Umut DEMİREZEN, Tuğba SELCEN NAVRUZ","doi":"10.55859/ijiss.1338813","DOIUrl":null,"url":null,"abstract":"In this research, we delve deep into the realm of Targeted Social Engineering Email Detection, presenting a novel approach that harnesses the power of Lambda Architecture (LA). Our innovative methodology strategically segments the BERT model into two distinct components: the embedding generator and the classification segment. This segmentation not only optimizes resource consumption but also improves system efficiency, making it a pioneering step in the field. Our empirical findings, derived from a rigorous comparison between the fastText and BERT models, underscore the superior performance of the latter. Specifically, The BERT model has high precision rates for identifying malicious and benign emails, with impressive recall values and F1 scores. Its overall accuracy rate was 0.9988, with a Matthews Correlation Coefficient value of 0.9978. In comparison, the fastText model showed lower precision rates. Leveraging principles reminiscent of the Lambda architecture, our study delves into the performance dynamics of data processing models. The Separated-BERT (Sep-BERT) model emerges as a robust contender, adept at managing both real-time (stream) and large-scale (batch) data processing. Compared to the traditional BERT, Sep-BERT showcased superior efficiency, with reduced memory and CPU consumption across diverse email sizes and ingestion rates. This efficiency, combined with rapid inference times, positions Sep-BERT as a scalable and cost-effective solution, aligning well with the demands of Lambda- inspired architectures. This study marks a significant step forward in the fields of big data and cybersecurity. By introducing a novel methodology and demonstrating its efficacy in detecting targeted social engineering emails, we not only advance the state of knowledge in these domains but also lay a robust foundation for future research endeavors, emphasizing the transformative potential of integrating advanced big data frameworks with machine learning models.","PeriodicalId":499704,"journal":{"name":"International Journal of Information Security Science","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Security Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55859/ijiss.1338813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this research, we delve deep into the realm of Targeted Social Engineering Email Detection, presenting a novel approach that harnesses the power of Lambda Architecture (LA). Our innovative methodology strategically segments the BERT model into two distinct components: the embedding generator and the classification segment. This segmentation not only optimizes resource consumption but also improves system efficiency, making it a pioneering step in the field. Our empirical findings, derived from a rigorous comparison between the fastText and BERT models, underscore the superior performance of the latter. Specifically, The BERT model has high precision rates for identifying malicious and benign emails, with impressive recall values and F1 scores. Its overall accuracy rate was 0.9988, with a Matthews Correlation Coefficient value of 0.9978. In comparison, the fastText model showed lower precision rates. Leveraging principles reminiscent of the Lambda architecture, our study delves into the performance dynamics of data processing models. The Separated-BERT (Sep-BERT) model emerges as a robust contender, adept at managing both real-time (stream) and large-scale (batch) data processing. Compared to the traditional BERT, Sep-BERT showcased superior efficiency, with reduced memory and CPU consumption across diverse email sizes and ingestion rates. This efficiency, combined with rapid inference times, positions Sep-BERT as a scalable and cost-effective solution, aligning well with the demands of Lambda- inspired architectures. This study marks a significant step forward in the fields of big data and cybersecurity. By introducing a novel methodology and demonstrating its efficacy in detecting targeted social engineering emails, we not only advance the state of knowledge in these domains but also lay a robust foundation for future research endeavors, emphasizing the transformative potential of integrating advanced big data frameworks with machine learning models.

查看原文本刊更多论文

基于Lambda架构的大规模针对性社会工程邮件检测大数据系统

在这项研究中，我们深入研究了目标社会工程电子邮件检测领域，提出了一种利用Lambda架构(LA)力量的新方法。我们的创新方法战略性地将BERT模型分为两个不同的部分:嵌入生成器和分类部分。这种细分不仅优化了资源消耗，而且提高了系统效率，是该领域的开创性举措。通过对fastText和BERT模型的严格比较，我们的实证研究结果强调了后者的优越性能。具体来说，BERT模型在识别恶意和良性电子邮件方面具有很高的准确率，具有令人印象深刻的召回值和F1分数。其总体准确率为0.9988，马修斯相关系数为0.9978。相比之下，fastText模型的准确率较低。利用让人想起Lambda架构的原则，我们的研究深入研究了数据处理模型的性能动态。分离的bert (Sep-BERT)模型是一个强大的竞争者，擅长管理实时(流)和大规模(批)数据处理。与传统的BERT相比，Sep-BERT展示了卓越的效率，在不同的电子邮件大小和摄取速率下，内存和CPU消耗都减少了。这种效率与快速的推理时间相结合，使Sep-BERT成为一种可扩展且经济高效的解决方案，与Lambda启发架构的需求很好地结合在一起。这项研究标志着大数据和网络安全领域向前迈出了重要一步。通过引入一种新颖的方法并证明其在检测目标社会工程电子邮件方面的有效性，我们不仅推进了这些领域的知识状态，而且为未来的研究工作奠定了坚实的基础，强调了将先进的大数据框架与机器学习模型集成的变革潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Information Security Science

自引率

0.00%

发文量