Trust the Source: A latency-based machine learning approach to accurate IP geolocation in internet

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computer Networks Pub Date : 2025-09-12 DOI:10.1016/j.comnet.2025.111721

Miguel A. Ortega-Velázquez , Alejandro S. Martínez-Sala , Pilar Manzanares-López , Maria-Dolores Cano , Antonio J. Jara

{"title":"Trust the Source: A latency-based machine learning approach to accurate IP geolocation in internet","authors":"Miguel A. Ortega-Velázquez , Alejandro S. Martínez-Sala , Pilar Manzanares-López , Maria-Dolores Cano , Antonio J. Jara","doi":"10.1016/j.comnet.2025.111721","DOIUrl":null,"url":null,"abstract":"<div><div>IP geolocation is the process of determining the geographic location of an Internet-connected device based on its IP address. Ensuring the authenticity of data sources has become critical for robust cybersecurity and plays a vital role in safeguarding systems by enabling applications such as fraud prevention, cybercrime investigations, and location-based access controls. There are two main approaches to IP geolocation: passive methods, which rely on public or historical data but may be outdated or inaccurate; and active methods, which use real-time latency measurements or routing path topology to infer location. Inspired by wireless location systems and the fingerprinting technique, this work proposes an active IP geolocation system that leverages Machine Learning to estimate IP locations using Round-Trip Time (RTT) latency measurements taken from a distributed network of probing nodes, referred to as Monitors. A central Coordinator collects RTT data from Monitors pinging known landmarks to build RTT fingerprints. These are used to train ML models that infer the location of unknown target nodes. The testbed system, consisting of a Coordinator server and six Monitors distributed across Europe, operated over a 65-day measurement campaign. More than 2 million RTT samples were collected from approximately 1700 Landmarks (used to train/test the ML models) and 1200 targets (used to evaluate the system). The K-Nearest Neighbours (KNN) and Multi-Layer Perceptron (MLP) algorithms are considered and compared with the reference Constraint-Based Geolocation (CBG) approach. The evaluation finds that the proposed system is capable of geolocating a point with a mean error of 317.6 km, a 38 % reduction compared to the CBG baseline. On the other hand, the average delay to complete the geolocation process is less than 5 s. These results demonstrate a scalable and cost-effective solution for medium-grained accuracy and bounded-delay IP geolocation in cybersecurity contexts.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"272 ","pages":"Article 111721"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625006875","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

IP geolocation is the process of determining the geographic location of an Internet-connected device based on its IP address. Ensuring the authenticity of data sources has become critical for robust cybersecurity and plays a vital role in safeguarding systems by enabling applications such as fraud prevention, cybercrime investigations, and location-based access controls. There are two main approaches to IP geolocation: passive methods, which rely on public or historical data but may be outdated or inaccurate; and active methods, which use real-time latency measurements or routing path topology to infer location. Inspired by wireless location systems and the fingerprinting technique, this work proposes an active IP geolocation system that leverages Machine Learning to estimate IP locations using Round-Trip Time (RTT) latency measurements taken from a distributed network of probing nodes, referred to as Monitors. A central Coordinator collects RTT data from Monitors pinging known landmarks to build RTT fingerprints. These are used to train ML models that infer the location of unknown target nodes. The testbed system, consisting of a Coordinator server and six Monitors distributed across Europe, operated over a 65-day measurement campaign. More than 2 million RTT samples were collected from approximately 1700 Landmarks (used to train/test the ML models) and 1200 targets (used to evaluate the system). The K-Nearest Neighbours (KNN) and Multi-Layer Perceptron (MLP) algorithms are considered and compared with the reference Constraint-Based Geolocation (CBG) approach. The evaluation finds that the proposed system is capable of geolocating a point with a mean error of 317.6 km, a 38 % reduction compared to the CBG baseline. On the other hand, the average delay to complete the geolocation process is less than 5 s. These results demonstrate a scalable and cost-effective solution for medium-grained accuracy and bounded-delay IP geolocation in cybersecurity contexts.

查看原文本刊更多论文

信任来源：一种基于延迟的机器学习方法，在互联网上实现准确的IP地理定位

IP地理定位是根据互联网连接设备的IP地址确定其地理位置的过程。确保数据源的真实性对于强大的网络安全至关重要，并且通过启用欺诈预防、网络犯罪调查和基于位置的访问控制等应用程序，在保护系统方面发挥着至关重要的作用。IP地理定位有两种主要方法：被动方法，依赖于公共或历史数据，但可能过时或不准确；主动方法，使用实时延迟测量或路由路径拓扑来推断位置。受无线定位系统和指纹识别技术的启发，这项工作提出了一种主动IP地理定位系统，该系统利用机器学习来估计IP位置，使用从分布式探测节点网络（称为监视器）获取的往返时间（RTT）延迟测量。中央协调器从监视器ping已知地标收集RTT数据以构建RTT指纹。这些用于训练机器学习模型，以推断未知目标节点的位置。测试平台系统由一个协调服务器和分布在欧洲各地的六个监视器组成，运行了65天的测量活动。从大约1700个路标（用于训练/测试ML模型）和1200个目标（用于评估系统）中收集了超过200万个RTT样本。考虑了k近邻（KNN）和多层感知器（MLP）算法，并与参考的基于约束的地理定位（CBG）方法进行了比较。评估发现，所提出的系统能够以平均误差317.6公里的方式定位一个点，与CBG基线相比减少了38%。另一方面，完成地理定位过程的平均延迟小于5秒。这些结果为网络安全环境下的中粒度精度和有界延迟IP地理定位提供了一种可扩展且经济高效的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Networks 工程技术-电信学

CiteScore

10.80

自引率

3.60%

发文量

434

审稿时长

8.6 months

期刊介绍： Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.