Alejandro Guerra-Manzanares , Maurantonio Caprolu , Roberto Di Pietro
{"title":"A comprehensive review on machine learning-based VPN detection: Scenarios, methods, and open challenges","authors":"Alejandro Guerra-Manzanares , Maurantonio Caprolu , Roberto Di Pietro","doi":"10.1016/j.cosrev.2025.100781","DOIUrl":null,"url":null,"abstract":"<div><div>Virtual Private Networks (VPNs) are an essential tool to protect user privacy and enforce secure communications over the Internet. However, they can also be misused to bypass legit network security mechanisms and hence access otherwise restricted content. These reasons, combined with the fact that VPN supporting technology has continuously evolved—reaching quite a relevant level of sophistication—make detecting VPN traffic a vested research issue for both academia and industry. In this paper, we provide a comprehensive review of machine learning-based (ML) solutions for VPN traffic detection. In particular, we start with framing the problem and identifying the main scenarios and related adversary models. Then, we provide a thorough analysis of the related literature and state-of-the-art in ML methodologies for VPN detection, identifying research gaps and unresolved challenges. In particular, we show that the vast majority of the current solutions rely on a specific dataset that suffers from a few severe limitations, hence questioning the validity of reported results when applied to real use case scenarios. Finally, we summarize existing knowledge highlighting common mistakes and providing guidelines as well as future research directions. To the best of our knowledge, this is the first paper that provides a deep dive into ML methodologies for VPN detection, showing current pitfalls, providing actionable recommendations, as well as suggesting research directions.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"58 ","pages":"Article 100781"},"PeriodicalIF":12.7000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725000577","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Virtual Private Networks (VPNs) are an essential tool to protect user privacy and enforce secure communications over the Internet. However, they can also be misused to bypass legit network security mechanisms and hence access otherwise restricted content. These reasons, combined with the fact that VPN supporting technology has continuously evolved—reaching quite a relevant level of sophistication—make detecting VPN traffic a vested research issue for both academia and industry. In this paper, we provide a comprehensive review of machine learning-based (ML) solutions for VPN traffic detection. In particular, we start with framing the problem and identifying the main scenarios and related adversary models. Then, we provide a thorough analysis of the related literature and state-of-the-art in ML methodologies for VPN detection, identifying research gaps and unresolved challenges. In particular, we show that the vast majority of the current solutions rely on a specific dataset that suffers from a few severe limitations, hence questioning the validity of reported results when applied to real use case scenarios. Finally, we summarize existing knowledge highlighting common mistakes and providing guidelines as well as future research directions. To the best of our knowledge, this is the first paper that provides a deep dive into ML methodologies for VPN detection, showing current pitfalls, providing actionable recommendations, as well as suggesting research directions.
期刊介绍:
Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.