基于智能爬虫的信息订阅系统的设计与实现

Peng Nie, Yu-Chen Zheng, Ruixuan Wang, Chu-Qiao Chen, Jianxiong Dong, Jia-Hao Liu
{"title":"基于智能爬虫的信息订阅系统的设计与实现","authors":"Peng Nie, Yu-Chen Zheng, Ruixuan Wang, Chu-Qiao Chen, Jianxiong Dong, Jia-Hao Liu","doi":"10.1109/ITCA52113.2020.00159","DOIUrl":null,"url":null,"abstract":"With the continuous growth of data volume in the information age, it is becoming more and more difficult for people to obtain information they care about. The traditional way of manually collecting information from such massive data is inconvenient and inefficient. In order to solve this problem, we design an web information subscription system based on intelligent crawler. In this paper, the first section introduces the design of the system, which includes the two confirmations to determine the specific monitoring area, the original xpath-based information positioning and block method, the task management based on apscheduler and so on. The second section introduces the implementation of the system, which included the process of how to operate the system. The third hows the running interface and results of the system. And finally the experiments show that the system can help users quickly obtain useful web information.","PeriodicalId":103309,"journal":{"name":"2020 2nd International Conference on Information Technology and Computer Application (ITCA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design and Implementation of Information Subscription System Based on Intelligent Crawler\",\"authors\":\"Peng Nie, Yu-Chen Zheng, Ruixuan Wang, Chu-Qiao Chen, Jianxiong Dong, Jia-Hao Liu\",\"doi\":\"10.1109/ITCA52113.2020.00159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the continuous growth of data volume in the information age, it is becoming more and more difficult for people to obtain information they care about. The traditional way of manually collecting information from such massive data is inconvenient and inefficient. In order to solve this problem, we design an web information subscription system based on intelligent crawler. In this paper, the first section introduces the design of the system, which includes the two confirmations to determine the specific monitoring area, the original xpath-based information positioning and block method, the task management based on apscheduler and so on. The second section introduces the implementation of the system, which included the process of how to operate the system. The third hows the running interface and results of the system. And finally the experiments show that the system can help users quickly obtain useful web information.\",\"PeriodicalId\":103309,\"journal\":{\"name\":\"2020 2nd International Conference on Information Technology and Computer Application (ITCA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Information Technology and Computer Application (ITCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITCA52113.2020.00159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Information Technology and Computer Application (ITCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCA52113.2020.00159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着信息时代数据量的不断增长,人们越来越难以获取自己关心的信息。从海量数据中手动收集信息的传统方式既不方便又效率低下。为了解决这一问题,我们设计了一个基于智能爬虫的web信息订阅系统。在本文中,第一部分介绍了系统的设计,包括确定具体监控区域的两个确认、基于原路径的信息定位和分块方法、基于apscheduler的任务管理等。第二部分介绍了系统的实现,包括如何操作系统的过程。第三部分是系统的运行界面和运行结果。最后通过实验表明,该系统可以帮助用户快速获取有用的网络信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Design and Implementation of Information Subscription System Based on Intelligent Crawler
With the continuous growth of data volume in the information age, it is becoming more and more difficult for people to obtain information they care about. The traditional way of manually collecting information from such massive data is inconvenient and inefficient. In order to solve this problem, we design an web information subscription system based on intelligent crawler. In this paper, the first section introduces the design of the system, which includes the two confirmations to determine the specific monitoring area, the original xpath-based information positioning and block method, the task management based on apscheduler and so on. The second section introduces the implementation of the system, which included the process of how to operate the system. The third hows the running interface and results of the system. And finally the experiments show that the system can help users quickly obtain useful web information.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信