Pan Liu, Yihao Li, Xuankui Zheng, Shili Ai, W. Zhang
{"title":"Design of Four Web Crawlers based on Python","authors":"Pan Liu, Yihao Li, Xuankui Zheng, Shili Ai, W. Zhang","doi":"10.1109/ISSSR56778.2022.00040","DOIUrl":null,"url":null,"abstract":"The data on websites is an important source of data for both big data analysis and machine learning. Due to the limitation of data crawling on some websites, the general web crawler will be invalid. To facilitate the crawling of data in websites with different structures, this paper introduces four types of web crawlers. Then, based on some third party libraries developed for Python, the corresponding Python programs are designed respectively for these four web crawlers. This paper provides a technical guide for those researchers who want to construct web crawlers quickly.","PeriodicalId":396707,"journal":{"name":"2022 8th International Symposium on System Security, Safety, and Reliability (ISSSR)","volume":"125 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Symposium on System Security, Safety, and Reliability (ISSSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSSR56778.2022.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The data on websites is an important source of data for both big data analysis and machine learning. Due to the limitation of data crawling on some websites, the general web crawler will be invalid. To facilitate the crawling of data in websites with different structures, this paper introduces four types of web crawlers. Then, based on some third party libraries developed for Python, the corresponding Python programs are designed respectively for these four web crawlers. This paper provides a technical guide for those researchers who want to construct web crawlers quickly.