Design of a Distributed Spiders System Based on Web Service

Liu Guangli, Z. Hongbin
{"title":"Design of a Distributed Spiders System Based on Web Service","authors":"Liu Guangli, Z. Hongbin","doi":"10.1109/WMWA.2009.15","DOIUrl":null,"url":null,"abstract":"A distributed spiders antitype was designed by Web Service based on Service-Oriented Architecture(SOA).This antitype is made up of a server and several clients. The clients are controlled to download a new web page by the server according to the crawled pages. Moreover, they must manage the To Crawl , Crawled URL queues and Noise URL queue after analyzing it by multi-threads. Furthermore, they keep connection with the server to pass the unknown URL and domain names. The server is made up of the front platform and the background. The front platform controls the clients including the design of load balance policy and real-time monitoring of clients by Microsoft Message Queue(MSMQ). Web service is deployed on the server background which contains the structure of persistent data connection. With the help of this structure, the front platform and the clients can access data by the normative interface. Finally, a lot of experiments were done which show that the distributed spiders system has good robust performance.","PeriodicalId":375180,"journal":{"name":"2009 Second Pacific-Asia Conference on Web Mining and Web-based Application","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Second Pacific-Asia Conference on Web Mining and Web-based Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WMWA.2009.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

A distributed spiders antitype was designed by Web Service based on Service-Oriented Architecture(SOA).This antitype is made up of a server and several clients. The clients are controlled to download a new web page by the server according to the crawled pages. Moreover, they must manage the To Crawl , Crawled URL queues and Noise URL queue after analyzing it by multi-threads. Furthermore, they keep connection with the server to pass the unknown URL and domain names. The server is made up of the front platform and the background. The front platform controls the clients including the design of load balance policy and real-time monitoring of clients by Microsoft Message Queue(MSMQ). Web service is deployed on the server background which contains the structure of persistent data connection. With the help of this structure, the front platform and the clients can access data by the normative interface. Finally, a lot of experiments were done which show that the distributed spiders system has good robust performance.
基于Web Service的分布式蜘蛛系统设计
基于面向服务的体系结构(Service- oriented Architecture, SOA),利用Web Service设计了一个分布式蜘蛛原型。这个原型由一个服务器和几个客户端组成。服务器根据抓取的网页控制客户端下载新的网页。此外,还必须对To Crawl、Crawl URL队列和Noise URL队列进行多线程分析后的管理。此外,它们与服务器保持连接,以传递未知的URL和域名。服务器由前端平台和后台组成。前端平台对客户端进行控制,包括负载均衡策略的设计和使用Microsoft Message Queue(MSMQ)对客户端的实时监控。Web服务部署在服务器后台,包含持久数据连接的结构。在这种结构的帮助下,前端平台和客户端可以通过规范的接口访问数据。实验结果表明,该分布式蜘蛛系统具有良好的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信