Big Data Analyses of ZeroNet Sites for Exploring the New Generation DarkWeb

Jianwei Ding, Xiaoyu Guo, Zhouguo Chen
{"title":"Big Data Analyses of ZeroNet Sites for Exploring the New Generation DarkWeb","authors":"Jianwei Ding, Xiaoyu Guo, Zhouguo Chen","doi":"10.1145/3378936.3378981","DOIUrl":null,"url":null,"abstract":"ZeroNet is a new generation typical dark web, which uses the Bitcoin encryption algorithm and BitTorrent technology to build a distributed and censored-resistant communication network. Based on our cumulative studies on the onion router, we present a big data analyses framework for automated multi-categorization of ZeroNet websites to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a distributed crawling infrastructure called ZeroCrawler that automatically crawls and updates ZeroNet websites in realtime. It stores data into a research repository designed to help better understand ZeroNet's hidden service ecosystem. The analysis component of our framework is called Automated Multi-Categorization Labeling (AMCL), which introduces a three-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) get a probability distribution of the keywords for different categories, and then (3) uses these terms to map ZeroNet website content to several labels. We also present empirical results of AMCL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our ZeroNet repository, now over 3000 indexed websites. The experimental results show that AMCL can discover categories on previously unlabeled websites, and we discuss applications of AMCL in supporting various analyses and investigations of the ZeroNet websites.","PeriodicalId":304149,"journal":{"name":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378936.3378981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

ZeroNet is a new generation typical dark web, which uses the Bitcoin encryption algorithm and BitTorrent technology to build a distributed and censored-resistant communication network. Based on our cumulative studies on the onion router, we present a big data analyses framework for automated multi-categorization of ZeroNet websites to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a distributed crawling infrastructure called ZeroCrawler that automatically crawls and updates ZeroNet websites in realtime. It stores data into a research repository designed to help better understand ZeroNet's hidden service ecosystem. The analysis component of our framework is called Automated Multi-Categorization Labeling (AMCL), which introduces a three-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) get a probability distribution of the keywords for different categories, and then (3) uses these terms to map ZeroNet website content to several labels. We also present empirical results of AMCL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our ZeroNet repository, now over 3000 indexed websites. The experimental results show that AMCL can discover categories on previously unlabeled websites, and we discuss applications of AMCL in supporting various analyses and investigations of the ZeroNet websites.
探索新一代暗网的ZeroNet网站大数据分析
ZeroNet是新一代典型的暗网,使用比特币加密算法和BitTorrent技术构建分布式、抗审查的通信网络。基于我们对洋葱路由器的累积研究,我们提出了一个ZeroNet网站自动多分类的大数据分析框架,以促进分析师对动态环境中出现的新内容的态势感知。在过去的两年里,我们的团队开发了一个名为ZeroCrawler的分布式抓取基础设施,它可以自动抓取和实时更新ZeroNet网站。它将数据存储到一个研究存储库中,旨在帮助更好地理解ZeroNet的隐藏服务生态系统。我们的框架的分析组件被称为自动多分类标签(AMCL),它引入了一个三阶段的主题标签策略:(1)它学习不同类别的描述性和判别性关键词,(2)得到不同类别的关键词的概率分布,然后(3)使用这些术语将ZeroNet网站内容映射到几个标签。我们还介绍了AMCL的经验结果和我们正在进行的实验,因为我们已经获得了将其应用于整个ZeroNet存储库的经验,现在有超过3000个索引网站。实验结果表明,AMCL可以发现以前未标记的网站的类别,并讨论了AMCL在支持ZeroNet网站的各种分析和调查中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信