{"title":"Big Data Analyses of ZeroNet Sites for Exploring the New Generation DarkWeb","authors":"Jianwei Ding, Xiaoyu Guo, Zhouguo Chen","doi":"10.1145/3378936.3378981","DOIUrl":null,"url":null,"abstract":"ZeroNet is a new generation typical dark web, which uses the Bitcoin encryption algorithm and BitTorrent technology to build a distributed and censored-resistant communication network. Based on our cumulative studies on the onion router, we present a big data analyses framework for automated multi-categorization of ZeroNet websites to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a distributed crawling infrastructure called ZeroCrawler that automatically crawls and updates ZeroNet websites in realtime. It stores data into a research repository designed to help better understand ZeroNet's hidden service ecosystem. The analysis component of our framework is called Automated Multi-Categorization Labeling (AMCL), which introduces a three-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) get a probability distribution of the keywords for different categories, and then (3) uses these terms to map ZeroNet website content to several labels. We also present empirical results of AMCL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our ZeroNet repository, now over 3000 indexed websites. The experimental results show that AMCL can discover categories on previously unlabeled websites, and we discuss applications of AMCL in supporting various analyses and investigations of the ZeroNet websites.","PeriodicalId":304149,"journal":{"name":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378936.3378981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
ZeroNet is a new generation typical dark web, which uses the Bitcoin encryption algorithm and BitTorrent technology to build a distributed and censored-resistant communication network. Based on our cumulative studies on the onion router, we present a big data analyses framework for automated multi-categorization of ZeroNet websites to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a distributed crawling infrastructure called ZeroCrawler that automatically crawls and updates ZeroNet websites in realtime. It stores data into a research repository designed to help better understand ZeroNet's hidden service ecosystem. The analysis component of our framework is called Automated Multi-Categorization Labeling (AMCL), which introduces a three-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) get a probability distribution of the keywords for different categories, and then (3) uses these terms to map ZeroNet website content to several labels. We also present empirical results of AMCL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our ZeroNet repository, now over 3000 indexed websites. The experimental results show that AMCL can discover categories on previously unlabeled websites, and we discuss applications of AMCL in supporting various analyses and investigations of the ZeroNet websites.