Perancangan System Crawler dengan MenerapkanArsitektur Distributed Task

Heri Santoso, Indyah Hartami Santi, Ni’ma Kholila
{"title":"Perancangan System Crawler dengan MenerapkanArsitektur Distributed Task","authors":"Heri Santoso, Indyah Hartami Santi, Ni’ma Kholila","doi":"10.28932/jutisi.v8i1.4205","DOIUrl":null,"url":null,"abstract":"PDC Media Group is a company engaged in online trading. The need for data insight in the online marketplace is very important. Likewise, how to get quite a lot of data, of course, requires automation such as crawling data on the marketplace website. Due to the large amount of data, crawler systems are often not optimal in crawling data. The application of distributed tasks on the crawler system provides convenience in scaling the server both vertically and horizontally. Therefore, the large and growing data can be handled by the crawler system. The application was developed with the Python language, with the  application server using Google Cloud Computing. In a distributed task architecture requires a component in the form of a message broker. The message broker used in designing this system is RabbitMQ. Testing the crawler system uses 3 scenarios, namely with 1 worker, 2worker, and 3 worker. The results for the 1 worker scenario are 19.3 requests per second and 332 ms for response time. The results for the 2 worker scenario are 41.4 requests per second and 328 ms for response time. While the results for the 3 worker scenario are 60 requests per second and 331 ms for response time.","PeriodicalId":185279,"journal":{"name":"Jurnal Teknik Informatika dan Sistem Informasi","volume":"28 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknik Informatika dan Sistem Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28932/jutisi.v8i1.4205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

PDC Media Group is a company engaged in online trading. The need for data insight in the online marketplace is very important. Likewise, how to get quite a lot of data, of course, requires automation such as crawling data on the marketplace website. Due to the large amount of data, crawler systems are often not optimal in crawling data. The application of distributed tasks on the crawler system provides convenience in scaling the server both vertically and horizontally. Therefore, the large and growing data can be handled by the crawler system. The application was developed with the Python language, with the  application server using Google Cloud Computing. In a distributed task architecture requires a component in the form of a message broker. The message broker used in designing this system is RabbitMQ. Testing the crawler system uses 3 scenarios, namely with 1 worker, 2worker, and 3 worker. The results for the 1 worker scenario are 19.3 requests per second and 332 ms for response time. The results for the 2 worker scenario are 41.4 requests per second and 328 ms for response time. While the results for the 3 worker scenario are 60 requests per second and 331 ms for response time.
Perancangan系统爬行器dengan MenerapkanArsitektur分布式任务
PDC传媒集团是一家从事网上交易的公司。在线市场对数据洞察力的需求非常重要。同样,如何获得大量的数据,当然需要自动化,比如在市场网站上抓取数据。由于数据量大,爬虫系统在抓取数据时往往不是最优的。分布式任务在爬虫系统上的应用为纵向和横向扩展服务器提供了便利。因此,爬虫系统可以处理大量和不断增长的数据。应用程序使用Python语言开发,应用服务器使用谷歌云计算。在分布式任务体系结构中,需要消息代理形式的组件。本系统设计中使用的消息代理是RabbitMQ。测试爬虫系统使用3个场景,即1个worker、2个worker和3个worker。1个worker场景的结果是每秒19.3个请求和332 ms的响应时间。2 worker场景的结果是每秒41.4个请求和328毫秒的响应时间。而对于3个worker场景,结果是每秒60个请求,响应时间为331毫秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信