服装和鞋类价格指数的扫描数据和网络抓取数据的比较

A. Chessa, R. Griffioen
{"title":"服装和鞋类价格指数的扫描数据和网络抓取数据的比较","authors":"A. Chessa, R. Griffioen","doi":"10.24187/ecostat.2019.509.1984","DOIUrl":null,"url":null,"abstract":"[eng] Statistical institutes are considering web scraping of online prices of consumer goods as a feasible alternative to scanner data. The lack of transaction data generates the question whether web scraped data are suited for price index calculation. This article investigates this question by comparing price indices based on web scraped and scanner data for clothing and footwear in the same webshop. Scanner data and web scraped prices are often equal, with the latter being slightly higher on average. Numbers of web scraped product prices and products sold show remarkably high correlations. Given the high churn rates of clothing products, a multilateral method (Geary-Khamis) was used to calculate price indices. For 16 product categories, the indices show small overall differences between the two data sources, with year on year indices differing only by 0.3 percentage point at COICOP level (men’s and women's clothing). It remains to be investigated whether such promising results for web scraped data will also be found for other retailers.","PeriodicalId":431625,"journal":{"name":"Economie et Statistique / Economics and Statistics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Comparing Price Indices of Clothing and Footwear for Scanner Data and Web Scraped Data\",\"authors\":\"A. Chessa, R. Griffioen\",\"doi\":\"10.24187/ecostat.2019.509.1984\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"[eng] Statistical institutes are considering web scraping of online prices of consumer goods as a feasible alternative to scanner data. The lack of transaction data generates the question whether web scraped data are suited for price index calculation. This article investigates this question by comparing price indices based on web scraped and scanner data for clothing and footwear in the same webshop. Scanner data and web scraped prices are often equal, with the latter being slightly higher on average. Numbers of web scraped product prices and products sold show remarkably high correlations. Given the high churn rates of clothing products, a multilateral method (Geary-Khamis) was used to calculate price indices. For 16 product categories, the indices show small overall differences between the two data sources, with year on year indices differing only by 0.3 percentage point at COICOP level (men’s and women's clothing). It remains to be investigated whether such promising results for web scraped data will also be found for other retailers.\",\"PeriodicalId\":431625,\"journal\":{\"name\":\"Economie et Statistique / Economics and Statistics\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Economie et Statistique / Economics and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24187/ecostat.2019.509.1984\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Economie et Statistique / Economics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24187/ecostat.2019.509.1984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

统计机构正考虑在网上搜集消费品的网上价格,作为扫描器数据的可行替代方案。交易数据的缺乏产生了一个问题,即网络抓取的数据是否适合用于价格指数的计算。本文通过比较同一网店中基于web抓取和扫描仪数据的服装和鞋类价格指数来研究这个问题。扫描仪数据和网页抓取的价格通常相等,后者的平均价格略高。网络抓取产品的数量和产品的销售价格显示出非常高的相关性。鉴于服装产品的高流失率,采用多边方法(Geary-Khamis)计算价格指数。对于16种产品类别,两种数据来源之间的指数总体差异很小,在COICOP水平(男装和女装)上,年度指数仅相差0.3个百分点。在其他零售商身上是否也能发现网络抓取数据的这种有希望的结果,还有待调查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing Price Indices of Clothing and Footwear for Scanner Data and Web Scraped Data
[eng] Statistical institutes are considering web scraping of online prices of consumer goods as a feasible alternative to scanner data. The lack of transaction data generates the question whether web scraped data are suited for price index calculation. This article investigates this question by comparing price indices based on web scraped and scanner data for clothing and footwear in the same webshop. Scanner data and web scraped prices are often equal, with the latter being slightly higher on average. Numbers of web scraped product prices and products sold show remarkably high correlations. Given the high churn rates of clothing products, a multilateral method (Geary-Khamis) was used to calculate price indices. For 16 product categories, the indices show small overall differences between the two data sources, with year on year indices differing only by 0.3 percentage point at COICOP level (men’s and women's clothing). It remains to be investigated whether such promising results for web scraped data will also be found for other retailers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信