{"title":"基于动态代理的网络地理信息系统数据采集爬虫策略","authors":"Shumiao Yu, Weifeng Sun, Minghan Jia","doi":"10.1109/CYBERC.2018.00094","DOIUrl":null,"url":null,"abstract":"With the development of geographic information system, digital earth and digital city play more and more important roles in life. The data generated by sensors or other edge nodes need to be collected by crawlers in the distributed systems in IoT, such as the GIS data in CyberGIS. In some edge networks, network operators have adopted methods to limit crawlers, such as blocking the request IP addresses, requiring logging in verification codes and other measures to avoid disturbance to servers. To collect data from web servers in these types of edge networks, a dynamic IP address based strategy DP-crawler is proposed to solve the anti-crawler strategies in the edge networks. DP-crawler can dynamic get proper IP addresses from a security-aware list and select the best available proxies. The security-aware list is designed to use the block-chain. Security and dynamic storage can be achieved by this method. DP-crawler is used to crawler webs, and the detailed information of Douban movies are obtained in the experiments. The experiment results show that the DP-Crawler can get more information by using the DP-Crawler.","PeriodicalId":282903,"journal":{"name":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Dynamic Proxy Based Crawler Strategy for Data Collection on CyberGIS\",\"authors\":\"Shumiao Yu, Weifeng Sun, Minghan Jia\",\"doi\":\"10.1109/CYBERC.2018.00094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of geographic information system, digital earth and digital city play more and more important roles in life. The data generated by sensors or other edge nodes need to be collected by crawlers in the distributed systems in IoT, such as the GIS data in CyberGIS. In some edge networks, network operators have adopted methods to limit crawlers, such as blocking the request IP addresses, requiring logging in verification codes and other measures to avoid disturbance to servers. To collect data from web servers in these types of edge networks, a dynamic IP address based strategy DP-crawler is proposed to solve the anti-crawler strategies in the edge networks. DP-crawler can dynamic get proper IP addresses from a security-aware list and select the best available proxies. The security-aware list is designed to use the block-chain. Security and dynamic storage can be achieved by this method. DP-crawler is used to crawler webs, and the detailed information of Douban movies are obtained in the experiments. The experiment results show that the DP-Crawler can get more information by using the DP-Crawler.\",\"PeriodicalId\":282903,\"journal\":{\"name\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CYBERC.2018.00094\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBERC.2018.00094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Dynamic Proxy Based Crawler Strategy for Data Collection on CyberGIS
With the development of geographic information system, digital earth and digital city play more and more important roles in life. The data generated by sensors or other edge nodes need to be collected by crawlers in the distributed systems in IoT, such as the GIS data in CyberGIS. In some edge networks, network operators have adopted methods to limit crawlers, such as blocking the request IP addresses, requiring logging in verification codes and other measures to avoid disturbance to servers. To collect data from web servers in these types of edge networks, a dynamic IP address based strategy DP-crawler is proposed to solve the anti-crawler strategies in the edge networks. DP-crawler can dynamic get proper IP addresses from a security-aware list and select the best available proxies. The security-aware list is designed to use the block-chain. Security and dynamic storage can be achieved by this method. DP-crawler is used to crawler webs, and the detailed information of Douban movies are obtained in the experiments. The experiment results show that the DP-Crawler can get more information by using the DP-Crawler.