Miguel A. Ortega-Velázquez , Alejandro S. Martínez-Sala , Pilar Manzanares-López , Maria-Dolores Cano , Antonio J. Jara
{"title":"Trust the Source: A latency-based machine learning approach to accurate IP geolocation in internet","authors":"Miguel A. Ortega-Velázquez , Alejandro S. Martínez-Sala , Pilar Manzanares-López , Maria-Dolores Cano , Antonio J. Jara","doi":"10.1016/j.comnet.2025.111721","DOIUrl":null,"url":null,"abstract":"<div><div>IP geolocation is the process of determining the geographic location of an Internet-connected device based on its IP address. Ensuring the authenticity of data sources has become critical for robust cybersecurity and plays a vital role in safeguarding systems by enabling applications such as fraud prevention, cybercrime investigations, and location-based access controls. There are two main approaches to IP geolocation: passive methods, which rely on public or historical data but may be outdated or inaccurate; and active methods, which use real-time latency measurements or routing path topology to infer location. Inspired by wireless location systems and the fingerprinting technique, this work proposes an active IP geolocation system that leverages Machine Learning to estimate IP locations using Round-Trip Time (RTT) latency measurements taken from a distributed network of probing nodes, referred to as Monitors. A central Coordinator collects RTT data from Monitors pinging known landmarks to build RTT fingerprints. These are used to train ML models that infer the location of unknown target nodes. The testbed system, consisting of a Coordinator server and six Monitors distributed across Europe, operated over a 65-day measurement campaign. More than 2 million RTT samples were collected from approximately 1700 Landmarks (used to train/test the ML models) and 1200 targets (used to evaluate the system). The K-Nearest Neighbours (KNN) and Multi-Layer Perceptron (MLP) algorithms are considered and compared with the reference Constraint-Based Geolocation (CBG) approach. The evaluation finds that the proposed system is capable of geolocating a point with a mean error of 317.6 km, a 38 % reduction compared to the CBG baseline. On the other hand, the average delay to complete the geolocation process is less than 5 s. These results demonstrate a scalable and cost-effective solution for medium-grained accuracy and bounded-delay IP geolocation in cybersecurity contexts.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"272 ","pages":"Article 111721"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625006875","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
IP geolocation is the process of determining the geographic location of an Internet-connected device based on its IP address. Ensuring the authenticity of data sources has become critical for robust cybersecurity and plays a vital role in safeguarding systems by enabling applications such as fraud prevention, cybercrime investigations, and location-based access controls. There are two main approaches to IP geolocation: passive methods, which rely on public or historical data but may be outdated or inaccurate; and active methods, which use real-time latency measurements or routing path topology to infer location. Inspired by wireless location systems and the fingerprinting technique, this work proposes an active IP geolocation system that leverages Machine Learning to estimate IP locations using Round-Trip Time (RTT) latency measurements taken from a distributed network of probing nodes, referred to as Monitors. A central Coordinator collects RTT data from Monitors pinging known landmarks to build RTT fingerprints. These are used to train ML models that infer the location of unknown target nodes. The testbed system, consisting of a Coordinator server and six Monitors distributed across Europe, operated over a 65-day measurement campaign. More than 2 million RTT samples were collected from approximately 1700 Landmarks (used to train/test the ML models) and 1200 targets (used to evaluate the system). The K-Nearest Neighbours (KNN) and Multi-Layer Perceptron (MLP) algorithms are considered and compared with the reference Constraint-Based Geolocation (CBG) approach. The evaluation finds that the proposed system is capable of geolocating a point with a mean error of 317.6 km, a 38 % reduction compared to the CBG baseline. On the other hand, the average delay to complete the geolocation process is less than 5 s. These results demonstrate a scalable and cost-effective solution for medium-grained accuracy and bounded-delay IP geolocation in cybersecurity contexts.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.