Kyle Stein, Andrew A. Mahyari, Guillermo Francia III, Eman El-Sheikh
{"title":"新型恶意数据包识别:少量学习方法","authors":"Kyle Stein, Andrew A. Mahyari, Guillermo Francia III, Eman El-Sheikh","doi":"arxiv-2409.11254","DOIUrl":null,"url":null,"abstract":"As the complexity and connectivity of networks increase, the need for novel\nmalware detection approaches becomes imperative. Traditional security defenses\nare becoming less effective against the advanced tactics of today's\ncyberattacks. Deep Packet Inspection (DPI) has emerged as a key technology in\nstrengthening network security, offering detailed analysis of network traffic\nthat goes beyond simple metadata analysis. DPI examines not only the packet\nheaders but also the payload content within, offering a thorough insight into\nthe data traversing the network. This study proposes a novel approach that\nleverages a large language model (LLM) and few-shot learning to accurately\nrecognizes novel, unseen malware types with few labels samples. Our proposed\napproach uses a pretrained LLM on known malware types to extract the embeddings\nfrom packets. The embeddings are then used alongside few labeled samples of an\nunseen malware type. This technique is designed to acclimate the model to\ndifferent malware representations, further enabling it to generate robust\nembeddings for each trained and unseen classes. Following the extraction of\nembeddings from the LLM, few-shot learning is utilized to enhance performance\nwith minimal labeled data. Our evaluation, which utilized two renowned\ndatasets, focused on identifying malware types within network traffic and\nInternet of Things (IoT) environments. Our approach shows promising results\nwith an average accuracy of 86.35% and F1-Score of 86.40% on different malware\ntypes across the two datasets.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Novel Malicious Packet Recognition: A Few-Shot Learning Approach\",\"authors\":\"Kyle Stein, Andrew A. Mahyari, Guillermo Francia III, Eman El-Sheikh\",\"doi\":\"arxiv-2409.11254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the complexity and connectivity of networks increase, the need for novel\\nmalware detection approaches becomes imperative. Traditional security defenses\\nare becoming less effective against the advanced tactics of today's\\ncyberattacks. Deep Packet Inspection (DPI) has emerged as a key technology in\\nstrengthening network security, offering detailed analysis of network traffic\\nthat goes beyond simple metadata analysis. DPI examines not only the packet\\nheaders but also the payload content within, offering a thorough insight into\\nthe data traversing the network. This study proposes a novel approach that\\nleverages a large language model (LLM) and few-shot learning to accurately\\nrecognizes novel, unseen malware types with few labels samples. Our proposed\\napproach uses a pretrained LLM on known malware types to extract the embeddings\\nfrom packets. The embeddings are then used alongside few labeled samples of an\\nunseen malware type. This technique is designed to acclimate the model to\\ndifferent malware representations, further enabling it to generate robust\\nembeddings for each trained and unseen classes. Following the extraction of\\nembeddings from the LLM, few-shot learning is utilized to enhance performance\\nwith minimal labeled data. Our evaluation, which utilized two renowned\\ndatasets, focused on identifying malware types within network traffic and\\nInternet of Things (IoT) environments. Our approach shows promising results\\nwith an average accuracy of 86.35% and F1-Score of 86.40% on different malware\\ntypes across the two datasets.\",\"PeriodicalId\":501332,\"journal\":{\"name\":\"arXiv - CS - Cryptography and Security\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11254\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Novel Malicious Packet Recognition: A Few-Shot Learning Approach
As the complexity and connectivity of networks increase, the need for novel
malware detection approaches becomes imperative. Traditional security defenses
are becoming less effective against the advanced tactics of today's
cyberattacks. Deep Packet Inspection (DPI) has emerged as a key technology in
strengthening network security, offering detailed analysis of network traffic
that goes beyond simple metadata analysis. DPI examines not only the packet
headers but also the payload content within, offering a thorough insight into
the data traversing the network. This study proposes a novel approach that
leverages a large language model (LLM) and few-shot learning to accurately
recognizes novel, unseen malware types with few labels samples. Our proposed
approach uses a pretrained LLM on known malware types to extract the embeddings
from packets. The embeddings are then used alongside few labeled samples of an
unseen malware type. This technique is designed to acclimate the model to
different malware representations, further enabling it to generate robust
embeddings for each trained and unseen classes. Following the extraction of
embeddings from the LLM, few-shot learning is utilized to enhance performance
with minimal labeled data. Our evaluation, which utilized two renowned
datasets, focused on identifying malware types within network traffic and
Internet of Things (IoT) environments. Our approach shows promising results
with an average accuracy of 86.35% and F1-Score of 86.40% on different malware
types across the two datasets.