基于scrapy的抓取及淘宝用户行为特征分析

Jing Wang, Yuchun Guo
{"title":"基于scrapy的抓取及淘宝用户行为特征分析","authors":"Jing Wang, Yuchun Guo","doi":"10.1109/CyberC.2012.17","DOIUrl":null,"url":null,"abstract":"The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers' data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users' behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.","PeriodicalId":416468,"journal":{"name":"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao\",\"authors\":\"Jing Wang, Yuchun Guo\",\"doi\":\"10.1109/CyberC.2012.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers' data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users' behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.\",\"PeriodicalId\":416468,\"journal\":{\"name\":\"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CyberC.2012.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2012.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

摘要

互联网的广泛使用为电子商务提供了良好的环境。对电子商务网络特征的研究一直集中在淘宝上。目前,基于淘宝的研究主要涉及信用评级体系、营销策略、卖家特征分析等方面。所有这些研究的目的都是为了分析电子商务中的在线营销交易。本文从图论的角度对电子商务网络进行了分析。我们的贡献主要体现在两个方面:(1)利用Scrapy抓取架构抓取淘宝分享平台。在深入分析了淘宝网页的格式后,结合BFS和MHRW两种采样方法,我们在五台pc上运行了爬虫程序30天。此外,我们还列出了爬行过程中遇到的一些大问题,并给出了最终的解决方案。此外,为了分析卖家和买家之间的关系,我们抓取了一类卖家的数据。(2)基于获得的数据集,分析淘宝共享平台用户行为特征。我们打算寻找在共享平台中以物品连接的卖家和买家之间的关系。令人惊讶的是,我们发现共享平台是一些买家为信用评分高的卖家做广告的工具,而其他买家只是帮助他们支持平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao
The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers' data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users' behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信