Splider: A split-based crawler of the BT-DHT network and its applications

Bingshuang Liu, Shidong Wu, Tao Wei, Chao Zhang, Jun Yu Li, Jianyu Zhang, Yu Chen, Chen Li
{"title":"Splider: A split-based crawler of the BT-DHT network and its applications","authors":"Bingshuang Liu, Shidong Wu, Tao Wei, Chao Zhang, Jun Yu Li, Jianyu Zhang, Yu Chen, Chen Li","doi":"10.1109/CCNC.2014.6866591","DOIUrl":null,"url":null,"abstract":"Capturing accurate snapshots of peer-to-peer (P2P) networks, especially those with millions of users, is essential to many P2P-based applications, including those monitoring and analyzing P2P networks. The large scale and dynamic nature of P2P networks, however, make this task very challenging. Existent crawlers of P2P networks, for example, often miss a substantial portion of the ID space while unnecessarily crawling numerous nodes repeatedly. In this paper, we design and evaluate a new crawler called Splider. Unlike traditional crawling algorithms that adopt an iterative approach, Splider recursively splits the ID space of P2P nodes to crawl even tiny corners of the ID space, while avoiding crawling repeated nodes. We further implement a Splider prototype for BT-DHT, a Kademlia-based distributed hash table (DHT) P2P network, that exploits the structure of routing tables at BT-DHT nodes. Experiments show that Splider is able to gather more than 16 million nodes with a 100% recall ratio, whereas a traditional iterative crawler can at best capture only about 8 million nodes with a 66% recall ratio while its traffic-cost effectiveness is 50% less than Splider. Splider can further support distributed deployment; without any synchronization overhead, it reduces the time of capturing a full snapshot to be only about 3 minutes. We finally report and analyze the captured BT-DHT snapshots, including the spatial and temporal distribution of BT-DHT nodes and the existence of sybil and eclipse attacks in BT-DHT.","PeriodicalId":287724,"journal":{"name":"2014 IEEE 11th Consumer Communications and Networking Conference (CCNC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 11th Consumer Communications and Networking Conference (CCNC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCNC.2014.6866591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Capturing accurate snapshots of peer-to-peer (P2P) networks, especially those with millions of users, is essential to many P2P-based applications, including those monitoring and analyzing P2P networks. The large scale and dynamic nature of P2P networks, however, make this task very challenging. Existent crawlers of P2P networks, for example, often miss a substantial portion of the ID space while unnecessarily crawling numerous nodes repeatedly. In this paper, we design and evaluate a new crawler called Splider. Unlike traditional crawling algorithms that adopt an iterative approach, Splider recursively splits the ID space of P2P nodes to crawl even tiny corners of the ID space, while avoiding crawling repeated nodes. We further implement a Splider prototype for BT-DHT, a Kademlia-based distributed hash table (DHT) P2P network, that exploits the structure of routing tables at BT-DHT nodes. Experiments show that Splider is able to gather more than 16 million nodes with a 100% recall ratio, whereas a traditional iterative crawler can at best capture only about 8 million nodes with a 66% recall ratio while its traffic-cost effectiveness is 50% less than Splider. Splider can further support distributed deployment; without any synchronization overhead, it reduces the time of capturing a full snapshot to be only about 3 minutes. We finally report and analyze the captured BT-DHT snapshots, including the spatial and temporal distribution of BT-DHT nodes and the existence of sybil and eclipse attacks in BT-DHT.
Splider:基于split的BT-DHT网络爬虫及其应用
获取点对点(P2P)网络的准确快照,特别是那些拥有数百万用户的网络,对于许多基于P2P的应用程序(包括那些监控和分析P2P网络的应用程序)至关重要。然而,P2P网络的大规模和动态性使得这项任务非常具有挑战性。例如,现有的P2P网络爬虫在不必要地重复爬行大量节点时,往往会遗漏相当一部分ID空间。在本文中,我们设计并评估了一种新的爬行器Splider。与传统的采用迭代方法的爬行算法不同,Splider递归地分割P2P节点的ID空间,甚至爬行ID空间的微小角落,同时避免爬行重复的节点。我们进一步实现了BT-DHT的Splider原型,BT-DHT是一个基于kademlia的分布式哈希表(DHT) P2P网络,它利用了BT-DHT节点上路由表的结构。实验表明,Splider能够以100%的召回率收集超过1600万个节点,而传统的迭代爬虫最多只能捕获约800万个节点,召回率为66%,其流量成本效率比Splider低50%。Splider可以进一步支持分布式部署;在没有任何同步开销的情况下,它将捕获完整快照的时间缩短到仅3分钟左右。最后,我们报告并分析了捕获的BT-DHT快照,包括BT-DHT节点的时空分布以及BT-DHT中是否存在sybil和eclipse攻击。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信