A comprehensive review on machine learning-based VPN detection: Scenarios, methods, and open challenges

IF 12.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science Review Pub Date : 2025-07-17 DOI:10.1016/j.cosrev.2025.100781

Alejandro Guerra-Manzanares , Maurantonio Caprolu , Roberto Di Pietro

{"title":"A comprehensive review on machine learning-based VPN detection: Scenarios, methods, and open challenges","authors":"Alejandro Guerra-Manzanares , Maurantonio Caprolu , Roberto Di Pietro","doi":"10.1016/j.cosrev.2025.100781","DOIUrl":null,"url":null,"abstract":"<div><div>Virtual Private Networks (VPNs) are an essential tool to protect user privacy and enforce secure communications over the Internet. However, they can also be misused to bypass legit network security mechanisms and hence access otherwise restricted content. These reasons, combined with the fact that VPN supporting technology has continuously evolved—reaching quite a relevant level of sophistication—make detecting VPN traffic a vested research issue for both academia and industry. In this paper, we provide a comprehensive review of machine learning-based (ML) solutions for VPN traffic detection. In particular, we start with framing the problem and identifying the main scenarios and related adversary models. Then, we provide a thorough analysis of the related literature and state-of-the-art in ML methodologies for VPN detection, identifying research gaps and unresolved challenges. In particular, we show that the vast majority of the current solutions rely on a specific dataset that suffers from a few severe limitations, hence questioning the validity of reported results when applied to real use case scenarios. Finally, we summarize existing knowledge highlighting common mistakes and providing guidelines as well as future research directions. To the best of our knowledge, this is the first paper that provides a deep dive into ML methodologies for VPN detection, showing current pitfalls, providing actionable recommendations, as well as suggesting research directions.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"58 ","pages":"Article 100781"},"PeriodicalIF":12.7000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725000577","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Virtual Private Networks (VPNs) are an essential tool to protect user privacy and enforce secure communications over the Internet. However, they can also be misused to bypass legit network security mechanisms and hence access otherwise restricted content. These reasons, combined with the fact that VPN supporting technology has continuously evolved—reaching quite a relevant level of sophistication—make detecting VPN traffic a vested research issue for both academia and industry. In this paper, we provide a comprehensive review of machine learning-based (ML) solutions for VPN traffic detection. In particular, we start with framing the problem and identifying the main scenarios and related adversary models. Then, we provide a thorough analysis of the related literature and state-of-the-art in ML methodologies for VPN detection, identifying research gaps and unresolved challenges. In particular, we show that the vast majority of the current solutions rely on a specific dataset that suffers from a few severe limitations, hence questioning the validity of reported results when applied to real use case scenarios. Finally, we summarize existing knowledge highlighting common mistakes and providing guidelines as well as future research directions. To the best of our knowledge, this is the first paper that provides a deep dive into ML methodologies for VPN detection, showing current pitfalls, providing actionable recommendations, as well as suggesting research directions.

查看原文本刊更多论文

基于机器学习的VPN检测：场景、方法和开放挑战的综合综述

虚拟专用网（vpn）是保护用户隐私和加强互联网上安全通信的重要工具。然而，它们也可能被滥用来绕过合法的网络安全机制，从而访问其他受限制的内容。这些原因，再加上VPN支持技术的不断发展，已经达到相当复杂的程度，使得检测VPN流量成为学术界和工业界的一个既定研究问题。在本文中，我们全面回顾了用于VPN流量检测的基于机器学习（ML）的解决方案。特别地，我们从构建问题并确定主要场景和相关的对手模型开始。然后，我们对用于VPN检测的ML方法的相关文献和最新技术进行了全面分析，确定了研究空白和未解决的挑战。特别是，我们表明，当前绝大多数解决方案依赖于特定的数据集，这些数据集受到一些严重的限制，因此在应用于实际用例场景时质疑报告结果的有效性。最后，我们总结了现有的知识，突出了常见的错误，并提出了指导方针以及未来的研究方向。据我们所知，这是第一篇深入探讨VPN检测的ML方法的论文，展示了当前的陷阱，提供了可操作的建议，并提出了研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Science Review Computer Science-General Computer Science

CiteScore

32.70

自引率

0.00%

发文量

审稿时长

51 days

期刊介绍： Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.