基于智能爬虫的信息订阅系统的设计与实现

2020 2nd International Conference on Information Technology and Computer Application (ITCA) Pub Date : 2020-12-01 DOI:10.1109/ITCA52113.2020.00159

Peng Nie, Yu-Chen Zheng, Ruixuan Wang, Chu-Qiao Chen, Jianxiong Dong, Jia-Hao Liu

{"title":"基于智能爬虫的信息订阅系统的设计与实现","authors":"Peng Nie, Yu-Chen Zheng, Ruixuan Wang, Chu-Qiao Chen, Jianxiong Dong, Jia-Hao Liu","doi":"10.1109/ITCA52113.2020.00159","DOIUrl":null,"url":null,"abstract":"With the continuous growth of data volume in the information age, it is becoming more and more difficult for people to obtain information they care about. The traditional way of manually collecting information from such massive data is inconvenient and inefficient. In order to solve this problem, we design an web information subscription system based on intelligent crawler. In this paper, the first section introduces the design of the system, which includes the two confirmations to determine the specific monitoring area, the original xpath-based information positioning and block method, the task management based on apscheduler and so on. The second section introduces the implementation of the system, which included the process of how to operate the system. The third hows the running interface and results of the system. And finally the experiments show that the system can help users quickly obtain useful web information.","PeriodicalId":103309,"journal":{"name":"2020 2nd International Conference on Information Technology and Computer Application (ITCA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design and Implementation of Information Subscription System Based on Intelligent Crawler\",\"authors\":\"Peng Nie, Yu-Chen Zheng, Ruixuan Wang, Chu-Qiao Chen, Jianxiong Dong, Jia-Hao Liu\",\"doi\":\"10.1109/ITCA52113.2020.00159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the continuous growth of data volume in the information age, it is becoming more and more difficult for people to obtain information they care about. The traditional way of manually collecting information from such massive data is inconvenient and inefficient. In order to solve this problem, we design an web information subscription system based on intelligent crawler. In this paper, the first section introduces the design of the system, which includes the two confirmations to determine the specific monitoring area, the original xpath-based information positioning and block method, the task management based on apscheduler and so on. The second section introduces the implementation of the system, which included the process of how to operate the system. The third hows the running interface and results of the system. And finally the experiments show that the system can help users quickly obtain useful web information.\",\"PeriodicalId\":103309,\"journal\":{\"name\":\"2020 2nd International Conference on Information Technology and Computer Application (ITCA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Information Technology and Computer Application (ITCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITCA52113.2020.00159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Information Technology and Computer Application (ITCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCA52113.2020.00159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着信息时代数据量的不断增长，人们越来越难以获取自己关心的信息。从海量数据中手动收集信息的传统方式既不方便又效率低下。为了解决这一问题，我们设计了一个基于智能爬虫的web信息订阅系统。在本文中，第一部分介绍了系统的设计，包括确定具体监控区域的两个确认、基于原路径的信息定位和分块方法、基于apscheduler的任务管理等。第二部分介绍了系统的实现，包括如何操作系统的过程。第三部分是系统的运行界面和运行结果。最后通过实验表明，该系统可以帮助用户快速获取有用的网络信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design and Implementation of Information Subscription System Based on Intelligent Crawler

With the continuous growth of data volume in the information age, it is becoming more and more difficult for people to obtain information they care about. The traditional way of manually collecting information from such massive data is inconvenient and inefficient. In order to solve this problem, we design an web information subscription system based on intelligent crawler. In this paper, the first section introduces the design of the system, which includes the two confirmations to determine the specific monitoring area, the original xpath-based information positioning and block method, the task management based on apscheduler and so on. The second section introduces the implementation of the system, which included the process of how to operate the system. The third hows the running interface and results of the system. And finally the experiments show that the system can help users quickly obtain useful web information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 2nd International Conference on Information Technology and Computer Application (ITCA)

自引率

0.00%

发文量