Daniel Gerszon Mahler, Umar Serajuddin, Hiroko Maeda
{"title":"什么时候有足够的数据来创建一个全球统计?","authors":"Daniel Gerszon Mahler, Umar Serajuddin, Hiroko Maeda","doi":"10.3233/sji-220090","DOIUrl":null,"url":null,"abstract":"To monitor progress towards global goals such as the Sustainable Development Goals, global statistics are needed. Yet cross-country datasets are rarely truly global, creating a trade-off for producers of global statistics: the lower the data coverage threshold for disseminating global statistics, the more can be made available, but the lower accuracy they will have. We quantify this availability-accuracy trade-off by running more than 10 million simulations on the World Development Indicators. We show that if the fraction of the world’s population on which one lacks data is x, then one should expect to be 0.37 *x standard deviations off the true global value, and risk being as much as x standard deviations off. We show the robustness of this result to various assumptions and give recommendations on when there is enough data to create global statistics. Though the decision will be context specific, in a baseline scenario we suggest not to create global statistics when there is data for less than half of the world’s population.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"When is there enough data to create a global statistic?\",\"authors\":\"Daniel Gerszon Mahler, Umar Serajuddin, Hiroko Maeda\",\"doi\":\"10.3233/sji-220090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To monitor progress towards global goals such as the Sustainable Development Goals, global statistics are needed. Yet cross-country datasets are rarely truly global, creating a trade-off for producers of global statistics: the lower the data coverage threshold for disseminating global statistics, the more can be made available, but the lower accuracy they will have. We quantify this availability-accuracy trade-off by running more than 10 million simulations on the World Development Indicators. We show that if the fraction of the world’s population on which one lacks data is x, then one should expect to be 0.37 *x standard deviations off the true global value, and risk being as much as x standard deviations off. We show the robustness of this result to various assumptions and give recommendations on when there is enough data to create global statistics. Though the decision will be context specific, in a baseline scenario we suggest not to create global statistics when there is data for less than half of the world’s population.\",\"PeriodicalId\":55877,\"journal\":{\"name\":\"Statistical Journal of the IAOS\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Journal of the IAOS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/sji-220090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Journal of the IAOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/sji-220090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}
When is there enough data to create a global statistic?
To monitor progress towards global goals such as the Sustainable Development Goals, global statistics are needed. Yet cross-country datasets are rarely truly global, creating a trade-off for producers of global statistics: the lower the data coverage threshold for disseminating global statistics, the more can be made available, but the lower accuracy they will have. We quantify this availability-accuracy trade-off by running more than 10 million simulations on the World Development Indicators. We show that if the fraction of the world’s population on which one lacks data is x, then one should expect to be 0.37 *x standard deviations off the true global value, and risk being as much as x standard deviations off. We show the robustness of this result to various assumptions and give recommendations on when there is enough data to create global statistics. Though the decision will be context specific, in a baseline scenario we suggest not to create global statistics when there is data for less than half of the world’s population.
期刊介绍:
This is the flagship journal of the International Association for Official Statistics and is expected to be widely circulated and subscribed to by individuals and institutions in all parts of the world. The main aim of the Journal is to support the IAOS mission by publishing articles to promote the understanding and advancement of official statistics and to foster the development of effective and efficient official statistical services on a global basis. Papers are expected to be of wide interest to readers. Such papers may or may not contain strictly original material. All papers are refereed.