{"title":"数据汇总中的范围经济:来自卫生数据的证据","authors":"Bruno Carballa-Smichowski , Néstor Duch-Brown , Seyit Höcük , Pradeep Kumar , Bertin Martens , Joris Mulder , Patricia Prüfer","doi":"10.1016/j.infoecopol.2025.101146","DOIUrl":null,"url":null,"abstract":"<div><div>Economies of scope in data aggregation (ESDA) are generated by the combination of complementary datasets involving the same observations. We estimate ESDA by progressively and randomly adding health and socioeconomic variables (predictors) to the machine-learning models we use to predict health outcomes. We find a positive effect of the number of variables on prediction quality, while holding the number of observations constant. We observe a positive relationship between variable complementarity and ESDA. ESDA show signs of increasing returns followed by decreasing returns. We further observe a long tail of highly contributing predictors in our data. These findings indicate that the nature of returns to scope in data aggregation may depend on the distribution of the predictors' information content. This underscores the importance of variable characteristics in determining ESDA's potential to create data barriers to entry. These results can help policymakers in designing data sharing initiatives such as the European Union's Common European Data Spaces.</div></div>","PeriodicalId":47029,"journal":{"name":"Information Economics and Policy","volume":"71 ","pages":"Article 101146"},"PeriodicalIF":3.2000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Economies of scope in data aggregation: Evidence from health data\",\"authors\":\"Bruno Carballa-Smichowski , Néstor Duch-Brown , Seyit Höcük , Pradeep Kumar , Bertin Martens , Joris Mulder , Patricia Prüfer\",\"doi\":\"10.1016/j.infoecopol.2025.101146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Economies of scope in data aggregation (ESDA) are generated by the combination of complementary datasets involving the same observations. We estimate ESDA by progressively and randomly adding health and socioeconomic variables (predictors) to the machine-learning models we use to predict health outcomes. We find a positive effect of the number of variables on prediction quality, while holding the number of observations constant. We observe a positive relationship between variable complementarity and ESDA. ESDA show signs of increasing returns followed by decreasing returns. We further observe a long tail of highly contributing predictors in our data. These findings indicate that the nature of returns to scope in data aggregation may depend on the distribution of the predictors' information content. This underscores the importance of variable characteristics in determining ESDA's potential to create data barriers to entry. These results can help policymakers in designing data sharing initiatives such as the European Union's Common European Data Spaces.</div></div>\",\"PeriodicalId\":47029,\"journal\":{\"name\":\"Information Economics and Policy\",\"volume\":\"71 \",\"pages\":\"Article 101146\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Economics and Policy\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167624525000204\",\"RegionNum\":3,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Economics and Policy","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167624525000204","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
Economies of scope in data aggregation: Evidence from health data
Economies of scope in data aggregation (ESDA) are generated by the combination of complementary datasets involving the same observations. We estimate ESDA by progressively and randomly adding health and socioeconomic variables (predictors) to the machine-learning models we use to predict health outcomes. We find a positive effect of the number of variables on prediction quality, while holding the number of observations constant. We observe a positive relationship between variable complementarity and ESDA. ESDA show signs of increasing returns followed by decreasing returns. We further observe a long tail of highly contributing predictors in our data. These findings indicate that the nature of returns to scope in data aggregation may depend on the distribution of the predictors' information content. This underscores the importance of variable characteristics in determining ESDA's potential to create data barriers to entry. These results can help policymakers in designing data sharing initiatives such as the European Union's Common European Data Spaces.
期刊介绍:
IEP is an international journal that aims to publish peer-reviewed policy-oriented research about the production, distribution and use of information, including these subjects: the economics of the telecommunications, mass media, and other information industries, the economics of innovation and intellectual property, the role of information in economic development, and the role of information and information technology in the functioning of markets. The purpose of the journal is to provide an interdisciplinary and international forum for theoretical and empirical research that addresses the needs of other researchers, government, and professionals who are involved in the policy-making process. IEP publishes research papers, short contributions, and surveys.