Jinlei Lin;Chenglong Li;Wenwen Gong;Guanglei Song;Linna Fan;Zhiliang Wang;Jiahai Yang
{"title":"ProbeGeo: A Comprehensive Landmark Mining Framework Based on Web Content","authors":"Jinlei Lin;Chenglong Li;Wenwen Gong;Guanglei Song;Linna Fan;Zhiliang Wang;Jiahai Yang","doi":"10.1109/TNET.2024.3422089","DOIUrl":null,"url":null,"abstract":"IP geolocation is essential for various location-aware Internet applications. High-quality IP geolocation landmarks play a decisive role in IP geolocation accuracy. However, the previous research works focusing on mining landmarks from the Internet are hampered by limited quantity, poor coverage, and insufficient landmark quality. In this paper, we present a new framework called ProbeGeo to mine high-quality landmarks automatically. We divide landmarks into common landmarks and probe landmarks, providing systematic mining methods based on online retrieval and web content. ProbeGeo expands traditional common landmarks by taking advantage of the exposure of multiple IoT (Internet of Things) devices on the Internet, mining them based on search engines and webpage contents. Common landmarks, consisting of multi-type devices, significantly improve landmark quantity and coverage. Furthermore, ProbeGeo establishes a methodology for acquiring new probe landmarks from Internet VPs (Vantage Points) webpages, extracting geographical locations from heterogeneous webpages and utilizing active probe functions. Probe landmarks enhance landmark quality and functions, bringing new geolocation frameworks and breaking through the geolocation accuracy bottleneck. We develop the ProbeGeo as a continuously running system and conduct real-world experiments to validate its efficacy. Our results show that ProbeGeo can detect 89,849 high-quality landmarks, including 6,874 probe landmarks and 82,975 common landmarks. ProbeGeo landmarks are about 10x more than existing work, distributed in 181 countries and 7,094 cities. ProbeGeo landmarks cover more than 8 types of devices, and more than 60% of them remain stable over one month. Moreover, the landmark accuracy of more than 58% of ProbeGeo landmarks is above street level, which has not been achieved in previous works. ProbeGeo can provide geolocation services with higher landmark accuracy and broader coverage by correlating a large scale of landmarks.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4398-4413"},"PeriodicalIF":3.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10615999/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
IP geolocation is essential for various location-aware Internet applications. High-quality IP geolocation landmarks play a decisive role in IP geolocation accuracy. However, the previous research works focusing on mining landmarks from the Internet are hampered by limited quantity, poor coverage, and insufficient landmark quality. In this paper, we present a new framework called ProbeGeo to mine high-quality landmarks automatically. We divide landmarks into common landmarks and probe landmarks, providing systematic mining methods based on online retrieval and web content. ProbeGeo expands traditional common landmarks by taking advantage of the exposure of multiple IoT (Internet of Things) devices on the Internet, mining them based on search engines and webpage contents. Common landmarks, consisting of multi-type devices, significantly improve landmark quantity and coverage. Furthermore, ProbeGeo establishes a methodology for acquiring new probe landmarks from Internet VPs (Vantage Points) webpages, extracting geographical locations from heterogeneous webpages and utilizing active probe functions. Probe landmarks enhance landmark quality and functions, bringing new geolocation frameworks and breaking through the geolocation accuracy bottleneck. We develop the ProbeGeo as a continuously running system and conduct real-world experiments to validate its efficacy. Our results show that ProbeGeo can detect 89,849 high-quality landmarks, including 6,874 probe landmarks and 82,975 common landmarks. ProbeGeo landmarks are about 10x more than existing work, distributed in 181 countries and 7,094 cities. ProbeGeo landmarks cover more than 8 types of devices, and more than 60% of them remain stable over one month. Moreover, the landmark accuracy of more than 58% of ProbeGeo landmarks is above street level, which has not been achieved in previous works. ProbeGeo can provide geolocation services with higher landmark accuracy and broader coverage by correlating a large scale of landmarks.
期刊介绍:
The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.