{"title":"基于树的缺失数据输入方法","authors":"P. Vateekul, Kanoksri Sarinnapakorn","doi":"10.1109/ICDMW.2009.92","DOIUrl":null,"url":null,"abstract":"Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called “Imputation Tree” (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called “Missing Pattern Tree” (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Tree-Based Approach to Missing Data Imputation\",\"authors\":\"P. Vateekul, Kanoksri Sarinnapakorn\",\"doi\":\"10.1109/ICDMW.2009.92\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called “Imputation Tree” (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called “Missing Pattern Tree” (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness.\",\"PeriodicalId\":351078,\"journal\":{\"name\":\"2009 IEEE International Conference on Data Mining Workshops\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2009.92\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called “Imputation Tree” (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called “Missing Pattern Tree” (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness.