{"title":"Design of a Distributed Spiders System Based on Web Service","authors":"Liu Guangli, Z. Hongbin","doi":"10.1109/WMWA.2009.15","DOIUrl":null,"url":null,"abstract":"A distributed spiders antitype was designed by Web Service based on Service-Oriented Architecture(SOA).This antitype is made up of a server and several clients. The clients are controlled to download a new web page by the server according to the crawled pages. Moreover, they must manage the To Crawl , Crawled URL queues and Noise URL queue after analyzing it by multi-threads. Furthermore, they keep connection with the server to pass the unknown URL and domain names. The server is made up of the front platform and the background. The front platform controls the clients including the design of load balance policy and real-time monitoring of clients by Microsoft Message Queue(MSMQ). Web service is deployed on the server background which contains the structure of persistent data connection. With the help of this structure, the front platform and the clients can access data by the normative interface. Finally, a lot of experiments were done which show that the distributed spiders system has good robust performance.","PeriodicalId":375180,"journal":{"name":"2009 Second Pacific-Asia Conference on Web Mining and Web-based Application","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Second Pacific-Asia Conference on Web Mining and Web-based Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WMWA.2009.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
A distributed spiders antitype was designed by Web Service based on Service-Oriented Architecture(SOA).This antitype is made up of a server and several clients. The clients are controlled to download a new web page by the server according to the crawled pages. Moreover, they must manage the To Crawl , Crawled URL queues and Noise URL queue after analyzing it by multi-threads. Furthermore, they keep connection with the server to pass the unknown URL and domain names. The server is made up of the front platform and the background. The front platform controls the clients including the design of load balance policy and real-time monitoring of clients by Microsoft Message Queue(MSMQ). Web service is deployed on the server background which contains the structure of persistent data connection. With the help of this structure, the front platform and the clients can access data by the normative interface. Finally, a lot of experiments were done which show that the distributed spiders system has good robust performance.