{"title":"评估数据融合方法以改进收入建模","authors":"Jana Emmenegger, R. Münnich, Jannik Schaller","doi":"10.1093/jssam/smac033","DOIUrl":null,"url":null,"abstract":"\n Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluating Data Fusion Methods to Improve Income Modeling\",\"authors\":\"Jana Emmenegger, R. Münnich, Jannik Schaller\",\"doi\":\"10.1093/jssam/smac033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2023-03-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/jssam/smac033\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jssam/smac033","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Evaluating Data Fusion Methods to Improve Income Modeling
Income is an important economic indicator to measure living standards and individual well-being. In Germany, different data sources yield ambiguous evidence for analyzing the income distribution. The Tax Statistics (TS)—an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014—contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus, a representative 1 percent sample of the German population. We examine two types of data fusion methods well suited to the specific data fusion scenario of the TS and the Microcensus: missing-data methods and performant prediction models. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.