Manuel Leonard F. Albis, Sabrina O. Romasoc, Shushimita G. Pelayo, Bea Andrea C. Gavira, Jazzen Paul J. Asombrado
{"title":"Web scraping for price statistics in the Philippines","authors":"Manuel Leonard F. Albis, Sabrina O. Romasoc, Shushimita G. Pelayo, Bea Andrea C. Gavira, Jazzen Paul J. Asombrado","doi":"10.3233/sji-230030","DOIUrl":null,"url":null,"abstract":"Official price statistics in the Philippines are mainly sourced from the conduct of regular surveys and censuses which entail high costs. As businesses move into digital platforms, alternatives to these traditional data sources have become more available; one of which is web scraping, a process of collecting information from the web. As digital and online platforms become increasingly utilized for commerce, web scraping offers a way to increase the frequency of data collection while reducing its cost compared to price surveys. This paper provides a survey of experiences of various government statistical agencies in their conduct of web scraping for the Consumer Price Index (CPI). Moreover, it details the Philippines’ experience using web scraped data to estimate the food and alcoholic beverages CPI of the National Capital Region in the Philippines, and that is compared to the official CPI estimate of the Philippine Statistics Authority. Finally, this paper discusses the challenges encountered and the recommendations for enhancing the approach.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"30 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Journal of the IAOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/sji-230030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Official price statistics in the Philippines are mainly sourced from the conduct of regular surveys and censuses which entail high costs. As businesses move into digital platforms, alternatives to these traditional data sources have become more available; one of which is web scraping, a process of collecting information from the web. As digital and online platforms become increasingly utilized for commerce, web scraping offers a way to increase the frequency of data collection while reducing its cost compared to price surveys. This paper provides a survey of experiences of various government statistical agencies in their conduct of web scraping for the Consumer Price Index (CPI). Moreover, it details the Philippines’ experience using web scraped data to estimate the food and alcoholic beverages CPI of the National Capital Region in the Philippines, and that is compared to the official CPI estimate of the Philippine Statistics Authority. Finally, this paper discusses the challenges encountered and the recommendations for enhancing the approach.
期刊介绍:
This is the flagship journal of the International Association for Official Statistics and is expected to be widely circulated and subscribed to by individuals and institutions in all parts of the world. The main aim of the Journal is to support the IAOS mission by publishing articles to promote the understanding and advancement of official statistics and to foster the development of effective and efficient official statistical services on a global basis. Papers are expected to be of wide interest to readers. Such papers may or may not contain strictly original material. All papers are refereed.