Pan Liu, Yihao Li, Xuankui Zheng, Shili Ai, W. Zhang
{"title":"基于Python的四个网络爬虫的设计","authors":"Pan Liu, Yihao Li, Xuankui Zheng, Shili Ai, W. Zhang","doi":"10.1109/ISSSR56778.2022.00040","DOIUrl":null,"url":null,"abstract":"The data on websites is an important source of data for both big data analysis and machine learning. Due to the limitation of data crawling on some websites, the general web crawler will be invalid. To facilitate the crawling of data in websites with different structures, this paper introduces four types of web crawlers. Then, based on some third party libraries developed for Python, the corresponding Python programs are designed respectively for these four web crawlers. This paper provides a technical guide for those researchers who want to construct web crawlers quickly.","PeriodicalId":396707,"journal":{"name":"2022 8th International Symposium on System Security, Safety, and Reliability (ISSSR)","volume":"125 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Design of Four Web Crawlers based on Python\",\"authors\":\"Pan Liu, Yihao Li, Xuankui Zheng, Shili Ai, W. Zhang\",\"doi\":\"10.1109/ISSSR56778.2022.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data on websites is an important source of data for both big data analysis and machine learning. Due to the limitation of data crawling on some websites, the general web crawler will be invalid. To facilitate the crawling of data in websites with different structures, this paper introduces four types of web crawlers. Then, based on some third party libraries developed for Python, the corresponding Python programs are designed respectively for these four web crawlers. This paper provides a technical guide for those researchers who want to construct web crawlers quickly.\",\"PeriodicalId\":396707,\"journal\":{\"name\":\"2022 8th International Symposium on System Security, Safety, and Reliability (ISSSR)\",\"volume\":\"125 8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 8th International Symposium on System Security, Safety, and Reliability (ISSSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSSR56778.2022.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Symposium on System Security, Safety, and Reliability (ISSSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSSR56778.2022.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The data on websites is an important source of data for both big data analysis and machine learning. Due to the limitation of data crawling on some websites, the general web crawler will be invalid. To facilitate the crawling of data in websites with different structures, this paper introduces four types of web crawlers. Then, based on some third party libraries developed for Python, the corresponding Python programs are designed respectively for these four web crawlers. This paper provides a technical guide for those researchers who want to construct web crawlers quickly.