Luca Gioacchini, Luca Vassio, Marco Mellia, Idilio Drago, Zied Ben Houidi
{"title":"i-DarkVec:暗网流量分析的增量嵌入","authors":"Luca Gioacchini, Luca Vassio, Marco Mellia, Idilio Drago, Zied Ben Houidi","doi":"https://dl.acm.org/doi/10.1145/3595378","DOIUrl":null,"url":null,"abstract":"<p>Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour. </p><p>We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities. </p><p>We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of <i>services</i>, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors, and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"115 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis\",\"authors\":\"Luca Gioacchini, Luca Vassio, Marco Mellia, Idilio Drago, Zied Ben Houidi\",\"doi\":\"https://dl.acm.org/doi/10.1145/3595378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour. </p><p>We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities. </p><p>We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of <i>services</i>, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors, and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios.</p>\",\"PeriodicalId\":50911,\"journal\":{\"name\":\"ACM Transactions on Internet Technology\",\"volume\":\"115 1\",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2023-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Internet Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3595378\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3595378","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis
Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour.
We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities.
We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of services, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors, and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios.
期刊介绍:
ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.