Finding Critical Files from a Packet

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Pub Date : 2021-05-10 DOI:10.1109/INFOCOM42981.2021.9488914

Junnyung Hur, Hahoon Jeon, Hyeon Gy Shon, Young Jae Kim, Myungkeun Yoon

{"title":"Finding Critical Files from a Packet","authors":"Junnyung Hur, Hahoon Jeon, Hyeon Gy Shon, Young Jae Kim, Myungkeun Yoon","doi":"10.1109/INFOCOM42981.2021.9488914","DOIUrl":null,"url":null,"abstract":"Network-based intrusion detection and data leakage prevention systems inspect packets to detect if critical files such as malware or confidential documents are transferred. However, this kind of detection requires heavy computing resources in reassembling packets and only well-known protocols can be interpreted. Besides, finding similar files from a storage requires pairwise comparisons. In this paper, we present a new network-based file identification scheme that inspects packets independently without reassembly and finds similar files through inverted indexing instead of pairwise comparison. We use a contents-based chunking algorithm to consistently divide both files and packets into multiple byte sequences, called chunks. If a packet is a part of a file, they would have common chunks. The challenging problem is that packet chunking and inverted-index search should be fast and scalable enough for packet processing. The file identification should be accurate although many chunks are noises. In this paper, we use a small Bloom filter and a delayed query strategy to solve the problems. To the best of our knowledge, this is the first scheme that identifies a specific critical file from a packet over unknown protocols. Experimental results show that the proposed scheme can successfully identify a critical file from a packet.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Network-based intrusion detection and data leakage prevention systems inspect packets to detect if critical files such as malware or confidential documents are transferred. However, this kind of detection requires heavy computing resources in reassembling packets and only well-known protocols can be interpreted. Besides, finding similar files from a storage requires pairwise comparisons. In this paper, we present a new network-based file identification scheme that inspects packets independently without reassembly and finds similar files through inverted indexing instead of pairwise comparison. We use a contents-based chunking algorithm to consistently divide both files and packets into multiple byte sequences, called chunks. If a packet is a part of a file, they would have common chunks. The challenging problem is that packet chunking and inverted-index search should be fast and scalable enough for packet processing. The file identification should be accurate although many chunks are noises. In this paper, we use a small Bloom filter and a delayed query strategy to solve the problems. To the best of our knowledge, this is the first scheme that identifies a specific critical file from a packet over unknown protocols. Experimental results show that the proposed scheme can successfully identify a critical file from a packet.

查看原文本刊更多论文

从数据包中查找关键文件

基于网络的入侵检测和数据泄漏防御系统通过检测报文是否传输了恶意软件或机密文件等重要文件。但是，这种检测方式在重组报文时需要大量的计算资源，并且只能解释已知的协议。此外，从存储中查找相似的文件需要两两比较。在本文中，我们提出了一种新的基于网络的文件识别方案，该方案独立检测数据包而不重组，并通过倒排索引而不是两两比较来查找相似的文件。我们使用基于内容的分块算法将文件和数据包一致地划分为多个字节序列，称为块。如果一个包是文件的一部分，那么它们将具有共同的块。具有挑战性的问题是，分组和倒排索引搜索对于分组处理来说应该足够快速和可扩展。文件识别应该是准确的，尽管许多块是噪声。在本文中，我们使用一个小的布隆过滤器和延迟查询策略来解决这个问题。据我们所知，这是第一个通过未知协议从数据包中识别特定关键文件的方案。实验结果表明，该方法能够成功地从数据包中识别出关键文件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications

自引率

0.00%

发文量