{"title":"通过统计匹配实现数据融合","authors":"P. van der Putten, J. Kok, Amarjeet R. Gupta","doi":"10.2139/ssrn.297501","DOIUrl":null,"url":null,"abstract":"In data mining applications, the availability of data is often a serious problem. For instance, elementary customer information resides in customer databases, but market survey data are only available for a subset of the customers or even for a different sample of customers. Data fusion provides a way out by combining information from different sources into a single data set for further data mining. While a significant amount of work has been done on data fusion in the past, most of the research has been performed outside of the data mining community. In this paper, we provide an overview of data fusion, introduce basic terminology and the statistical matching approach, distinguish between internal and external evaluation, and we conclude with a larger case study.","PeriodicalId":11485,"journal":{"name":"Econometrics: Applied Econometrics & Modeling eJournal","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"Data Fusion Through Statistical Matching\",\"authors\":\"P. van der Putten, J. Kok, Amarjeet R. Gupta\",\"doi\":\"10.2139/ssrn.297501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In data mining applications, the availability of data is often a serious problem. For instance, elementary customer information resides in customer databases, but market survey data are only available for a subset of the customers or even for a different sample of customers. Data fusion provides a way out by combining information from different sources into a single data set for further data mining. While a significant amount of work has been done on data fusion in the past, most of the research has been performed outside of the data mining community. In this paper, we provide an overview of data fusion, introduce basic terminology and the statistical matching approach, distinguish between internal and external evaluation, and we conclude with a larger case study.\",\"PeriodicalId\":11485,\"journal\":{\"name\":\"Econometrics: Applied Econometrics & Modeling eJournal\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Econometrics: Applied Econometrics & Modeling eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.297501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Econometrics: Applied Econometrics & Modeling eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.297501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In data mining applications, the availability of data is often a serious problem. For instance, elementary customer information resides in customer databases, but market survey data are only available for a subset of the customers or even for a different sample of customers. Data fusion provides a way out by combining information from different sources into a single data set for further data mining. While a significant amount of work has been done on data fusion in the past, most of the research has been performed outside of the data mining community. In this paper, we provide an overview of data fusion, introduce basic terminology and the statistical matching approach, distinguish between internal and external evaluation, and we conclude with a larger case study.