AI & ML Based Anamoly Detection and Response Using Ember Dataset

2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) Pub Date : 2021-09-03 DOI:10.1109/icrito51393.2021.9596451

Viraj Rathod, C. Parekh, Dharati Dholariya

{"title":"AI & ML Based Anamoly Detection and Response Using Ember Dataset","authors":"Viraj Rathod, C. Parekh, Dharati Dholariya","doi":"10.1109/icrito51393.2021.9596451","DOIUrl":null,"url":null,"abstract":"In the era of rapid technological growth, malicious traffic has drawn increased attention. Most well-known offensive security assessment todays are heavily focused on pre-compromise. The amount of anomalous data in today's context is massive. Analyzing the data using primitive methods would be highly challenging. Solution to it is: If we can detect adversary behaviors in the early stage of compromise, one can prevent and safeguard themselves from various attacks including ransomwares and Zero-day attacks. Integration of new technologies Artificial Intelligence & Machine Learning with manual Anomaly Detection can provide automated machine-based detection which in return can provide the fast, error free, simplify & scalable Threat Detection & Response System. Endpoint Detection & Response (EDR) tools provide a unified view of complex intrusions using known adversarial behaviors to identify intrusion events. We have used the EMBER dataset, which is a labelled benchmark dataset. It is used to train machine learning models to detect malicious portable executable files. This dataset consists of features derived from 1.1 million binary files: 900,000 training samples among which 300,000 were malicious, 300,000 were benevolent, 300,000 un-labelled, and 200,000 evaluation samples among which 100K were malicious, 100K were benign. We have also included open-source code for extracting features from additional binaries, enabling the addition of additional sample features to the dataset.","PeriodicalId":259978,"journal":{"name":"2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icrito51393.2021.9596451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the era of rapid technological growth, malicious traffic has drawn increased attention. Most well-known offensive security assessment todays are heavily focused on pre-compromise. The amount of anomalous data in today's context is massive. Analyzing the data using primitive methods would be highly challenging. Solution to it is: If we can detect adversary behaviors in the early stage of compromise, one can prevent and safeguard themselves from various attacks including ransomwares and Zero-day attacks. Integration of new technologies Artificial Intelligence & Machine Learning with manual Anomaly Detection can provide automated machine-based detection which in return can provide the fast, error free, simplify & scalable Threat Detection & Response System. Endpoint Detection & Response (EDR) tools provide a unified view of complex intrusions using known adversarial behaviors to identify intrusion events. We have used the EMBER dataset, which is a labelled benchmark dataset. It is used to train machine learning models to detect malicious portable executable files. This dataset consists of features derived from 1.1 million binary files: 900,000 training samples among which 300,000 were malicious, 300,000 were benevolent, 300,000 un-labelled, and 200,000 evaluation samples among which 100K were malicious, 100K were benign. We have also included open-source code for extracting features from additional binaries, enabling the addition of additional sample features to the dataset.

查看原文本刊更多论文

基于Ember数据集的异常检测和响应

在科技飞速发展的时代，恶意流量越来越受到人们的关注。当今最著名的进攻性安全评估主要集中在预妥协上。在今天的背景下，异常数据的数量是巨大的。使用原始方法分析数据将非常具有挑战性。解决方案是:如果我们能够在妥协的早期阶段检测到对手的行为，我们就可以预防和保护自己免受各种攻击，包括勒索软件和零日攻击。人工智能和机器学习与人工异常检测的新技术集成可以提供自动化的基于机器的检测，反过来可以提供快速，无错误，简化和可扩展的威胁检测和响应系统。端点检测和响应(EDR)工具使用已知的对抗行为来识别入侵事件，提供复杂入侵的统一视图。我们使用了EMBER数据集，这是一个标记的基准数据集。它被用来训练机器学习模型来检测恶意的可移植可执行文件。该数据集由来自110万个二进制文件的特征组成:90万个训练样本，其中30万个是恶意的，30万个是善意的，30万个是未标记的，20万个评估样本，其中10万个是恶意的，10万个是良性的。我们还包含了用于从其他二进制文件中提取特征的开源代码，从而可以向数据集添加其他样例特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)

自引率

0.00%

发文量