{"title":"Building trust and facilitating use of data","authors":"Francesca Perucci, Eric Swanson","doi":"10.3233/sji-240006","DOIUrl":"https://doi.org/10.3233/sji-240006","url":null,"abstract":"Multiple crises, including the COVID-19 pandemic and increased frequency and intensity of disasters related to climate change, have demonstrated the critical importance of timely and open access to trusted data. Open data principles and practices that facilitate data access and use, relevance to policy needs, and increase the impact and value of data are central to building trust in data. The paper outlines four trends that present opportunities for expanding adoption and use of open data principles and practices and building data trust: the modernization of data governance; increased attention to the role of citizens in building trust and increasing the relevance of data and citizens’ contribution to data throughout the data value chain; the adoption of open data principles; and the work of watchdog organizations monitoring the progress of countries and agencies and identifying areas of data governance that still need attention.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"177 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140469907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Asatryan, V. Aleksanyan, Samvel Asatryan, M. Manucharyan
{"title":"Analyzing commercial grape farm efficiency in Armavir region (Armenia) by using two-stage empirical approach","authors":"H. Asatryan, V. Aleksanyan, Samvel Asatryan, M. Manucharyan","doi":"10.3233/sji-230064","DOIUrl":"https://doi.org/10.3233/sji-230064","url":null,"abstract":"The purpose of this paper is to provide an empirical assessment of the economic efficiency of grape-producing farms in Armenia. Upon reviewing various field-related studies the frontier analysis was singled out as a methodological base of this study. More specifically two-stage empirical analysis was performed, which includes the measurement of efficiency levels of grape farms by implementing the DEA technique and then assessing the determinants of obtained efficiency scores by performing Tobit modeling. To obtain necessary data, 365 grape farms from the Armavir region were surveyed. The main findings of this paper suggest that the average efficiency score for grape farms is 0.72, and there is room for improvement in the economic performance of farms with 28%. The main determinants of farm efficiency were cultivated grape varieties, farm size, and selling prices of grapes. The obtained results mainly support the findings of similar studies carried out for various viticulture regions across the world. This study provides some methodology bases for further expansion of similar studies both in terms of including the other Armenian viticulture regions and different years to explore the changes in the efficiency of grape farms over time. This article provides a base of knowledge for policymakers, scholars, researchers, investors, and credit companies for their decision-making processes and other purposes.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"28 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140499035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charlotte Juul Hansen, Lina Maria Sanchez Cespedes, Leonardo Trujillo Oyola, X. K. Dimakos, Bianca Walsh, Renata Souza Bueno, Amos T. Kabo-Bah, Omar Seidu, Vibeke Oestreich Nielsen
{"title":"Collaboration between national statistical offices and academia: Benefits, conditions, areas of collaboration and practical level experience in countries","authors":"Charlotte Juul Hansen, Lina Maria Sanchez Cespedes, Leonardo Trujillo Oyola, X. K. Dimakos, Bianca Walsh, Renata Souza Bueno, Amos T. Kabo-Bah, Omar Seidu, Vibeke Oestreich Nielsen","doi":"10.3233/sji-230117","DOIUrl":"https://doi.org/10.3233/sji-230117","url":null,"abstract":"National statistical offices (NSOs) and academia benefit from establishing partnerships and collaborating in different ways by bringing together their respective expertise. Collaborative alliances of this nature appear to offer numerous advantages for both the partners and the public and seem to be essential for unlocking opportunities within the evolving data ecosystem. Establishing good and fruitful collaboration between academia and NSOs requires a collaborative environment where each partner can see the benefits of the collaboration and how they could contribute. Different areas of collaboration are presented within four categories: education and learning, research, promotion of data use in society and providing services to each other. The article further discusses the benefits and conditions of a successful partnership. Examples from Brazil, Colombia, Ghana, and Norway showcase practical-level experiences and some lessons learned at the country level.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"37 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140498342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas Klingwort, Sven Alexander Brocker, Christian Borgs
{"title":"Spatial and demographic distributions of personal insolvency: An opportunity for official statistics","authors":"Jonas Klingwort, Sven Alexander Brocker, Christian Borgs","doi":"10.3233/sji-230072","DOIUrl":"https://doi.org/10.3233/sji-230072","url":null,"abstract":"German official statistics publish statistics on personal insolvency. These statistics have been recently enhanced using web scraping to extract additional information from a public website on which the insolvency announcements are published. The currently scraped data is used for quality assurance and to derive an early indicator of personal insolvency. This paper provides novel methodological analyses for the same administrative database and presents further opportunities to improve the current official statistics regarding detail and timeliness using web scraping and text mining. These newly derived statistics inform on several aspects regarding personal insolvency’s demographic and spatial distribution.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"30 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138997362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unbiased estimation strategies for respondent driven sampling","authors":"P. D. Falorsi, G. Alleva, Francesca Petrarca","doi":"10.3233/sji-230087","DOIUrl":"https://doi.org/10.3233/sji-230087","url":null,"abstract":"In this paper, we focus on respondent-driven sampling (RDS), which is a valuable survey methodology to estimate the size and the characteristics of hidden or hard-to-measure population groups. The RDS methodology makes it possible to gather information on these populations by exploiting the relationships between their components. However, RDS suffers from the lack of an estimation methodology that is sufficiently robust to accommodate the varying conditions under which it is applied. In this paper, we address the estimation problem of the RDS methodology and, by approaching it as a particular indirect sampling technique, we propose three unbiased estimation methods as possible solutions.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139254869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hard-to-reach population groups in administrative sources: main challenges and future work","authors":"Donatella Zindato, Maciej Truszczynski","doi":"10.3233/sji-230074","DOIUrl":"https://doi.org/10.3233/sji-230074","url":null,"abstract":"The paper deals with the concept and the definitions of hard-to-reach groups and the ways of capturing them in administrative sources, providing a detailed discussion of the meaning of hard-to-reach in the context of administrative sources and in relation to the traditional hard-to-count groups in censuses and surveys. The review of country practices shows that hard-to-reach populations in administrative data can be interpreted in different ways and that their definition is dependent on countries’ circumstances, though there are two main reasons for identifying a group as hard-to-reach in administrative sources. One of the interpretations is selecting some groups, typically considered difficult to reach with traditional survey methods (such as homeless, illegal immigrants or indigenous people) and then trying to capture them in registers to overcome the challenges of traditional field collection or to get more complete information. At first glance, administrative data might offer the potential to improve frame coverage for some target populations, but may also lead to other hard-to-reach or “hidden” populations for different population groups. Indeed, another interpretation refers to the incompleteness of registers or linked administrative databases, which makes some groups, such as children or elders, hard-to-reach and hence describe with data, due to time lag in reporting of some events or to other accuracy problems with the source itself. The paper summarizes the experience of national statistical offices in accessing hard-to-reach groups and describes problems and challenges in capturing them. It also proposes further possible work to improve access to hard-to-reach groups using administrative data.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"16 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139254707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning estimation of the resident population","authors":"Violeta Calian, Margherita Zuppardo, Omar Hardarson","doi":"10.3233/sji-230090","DOIUrl":"https://doi.org/10.3233/sji-230090","url":null,"abstract":"In this paper, we formulate the problem of estimating the resident population, i.e. correcting for over-counts in administrative register data, as a binary classification problem. We propose a solution based on machine learning algorithms. The selection and the optimisation of the best algorithm is shown to depend on the goal of prediction. We illustrate this method for two important cases of official statistics, Census resident population and survey design with minimum non-response. The performance of the algorithms, the uncertainty of estimates and of the evaluation metrics are described in detail and implemented in shared, open source code. We exemplify with the results obtained by applying this method to Icelandic register and survey data.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139260061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Leonard F. Albis, Sabrina O. Romasoc, Shushimita G. Pelayo, Bea Andrea C. Gavira, Jazzen Paul J. Asombrado
{"title":"Web scraping for price statistics in the Philippines","authors":"Manuel Leonard F. Albis, Sabrina O. Romasoc, Shushimita G. Pelayo, Bea Andrea C. Gavira, Jazzen Paul J. Asombrado","doi":"10.3233/sji-230030","DOIUrl":"https://doi.org/10.3233/sji-230030","url":null,"abstract":"Official price statistics in the Philippines are mainly sourced from the conduct of regular surveys and censuses which entail high costs. As businesses move into digital platforms, alternatives to these traditional data sources have become more available; one of which is web scraping, a process of collecting information from the web. As digital and online platforms become increasingly utilized for commerce, web scraping offers a way to increase the frequency of data collection while reducing its cost compared to price surveys. This paper provides a survey of experiences of various government statistical agencies in their conduct of web scraping for the Consumer Price Index (CPI). Moreover, it details the Philippines’ experience using web scraped data to estimate the food and alcoholic beverages CPI of the National Capital Region in the Philippines, and that is compared to the official CPI estimate of the Philippine Statistics Authority. Finally, this paper discusses the challenges encountered and the recommendations for enhancing the approach.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"30 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139264876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To count or to estimate: A note on compiling population estimates from administrative data","authors":"John Dunne, Francesca Kay, Timothy Linehan","doi":"10.3233/sji-230067","DOIUrl":"https://doi.org/10.3233/sji-230067","url":null,"abstract":"Like many countries, Ireland has been researching new systems of population estimates compiled using administrative data. Ireland does not have a Central Population Register from which the estimates can be compiled. The primary step in compiling population estimates from administrative data is to first build a Statistical Population Dataset (SPD). Ideally an SPD will have one record for each person in the population containing the relevant attributes. The ideal SPD then allows compilation of statistics by simply counting over records. In practice, the compilation of SPDs is prone to error. These errors can be classified into 4 types of error; overcoverage, undercoverage, domain misclassification and linkage error. Ireland, to date, has investigated 2 different approaches to the compilation of population estimates from administrative data. The first, labeled in this paper as the simple count method, is based on building an SPD which minimises the overall number of individual record errors such that simple counts from the SPD will provide population estimates. The second, labeled in this paper as the estimation method, is based on building an SPD which aims to eliminate all error types bar that of undercoverage and then adjusts counts for undercoverage using Dual System Estimation (DSE) methods to obtain population estimates. This paper explores the advantages and disadvantages of both methods before considering how they could be integrated to eliminate the disadvantages. Many NSIs will be considering similar challenges when compiling annual Census like population estimates and this paper aims to contribute to that discussion.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"6 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139271001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying respondent comments from the 2021 Canadian Census of Population using machine learning methods1","authors":"Joanne Yoon","doi":"10.3233/sji-230063","DOIUrl":"https://doi.org/10.3233/sji-230063","url":null,"abstract":"To improve the analysis of respondent comments from the Canadian Census of Population, data scientists at Statistics Canada compared and evaluated traditional machine learning, deep learning and transformer-based techniques. Cross-lingual Language Model-Robustly Optimized Bidirectional Encoder Representations from Transformers (XLM-R), a cross-lingual language model, fine-tuned on census respondent comments yield the best result of 89.91% F1 score overall despite language and class imbalances. Following the evaluation, the fine-tuned model was implemented successfully to objectively categorize comments from the 2021 Census of Population, with high accuracy. As a result, feedback from respondents was directed to the appropriate subject matter analysts, for them to analyze post-collection.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"46 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139276268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}