M. Giardina, Yongyang Huo, F. Azuaje, P. Mccullagh, R. Harper
{"title":"2型糖尿病数据库中缺失的数据估计分析","authors":"M. Giardina, Yongyang Huo, F. Azuaje, P. Mccullagh, R. Harper","doi":"10.1109/CBMS.2005.13","DOIUrl":null,"url":null,"abstract":"Type II diabetes is one of the most common causes of disability and death in the United Kingdom. This investigation analysed data acquired from diabetic patients at the Ulster Hospital in Northern Ireland in terms of statistical descriptive indicators and missing values. Such data are noisy and incomplete. This paper reports a comprehensive missing data estimation analysis. Five missing value imputation methods were compared, including k-Nearest Neighbours (k-NN) and correlation-based estimation models. From this analysis it can be concluded that a feature-based correlation method known as EMImpute/spl I.bar/Columns is a promising approach to estimating missing values. Nevertheless, k-NN methods may be useful to provide relatively accurate estimations with lower error variability. These estimation techniques will support the implementation of supervised and unsupervised learning tools for coronary heart disease risk assessment, a major complication of diabetes.","PeriodicalId":119367,"journal":{"name":"18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A missing data estimation analysis in type II diabetes databases\",\"authors\":\"M. Giardina, Yongyang Huo, F. Azuaje, P. Mccullagh, R. Harper\",\"doi\":\"10.1109/CBMS.2005.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Type II diabetes is one of the most common causes of disability and death in the United Kingdom. This investigation analysed data acquired from diabetic patients at the Ulster Hospital in Northern Ireland in terms of statistical descriptive indicators and missing values. Such data are noisy and incomplete. This paper reports a comprehensive missing data estimation analysis. Five missing value imputation methods were compared, including k-Nearest Neighbours (k-NN) and correlation-based estimation models. From this analysis it can be concluded that a feature-based correlation method known as EMImpute/spl I.bar/Columns is a promising approach to estimating missing values. Nevertheless, k-NN methods may be useful to provide relatively accurate estimations with lower error variability. These estimation techniques will support the implementation of supervised and unsupervised learning tools for coronary heart disease risk assessment, a major complication of diabetes.\",\"PeriodicalId\":119367,\"journal\":{\"name\":\"18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05)\",\"volume\":\"149 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CBMS.2005.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2005.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A missing data estimation analysis in type II diabetes databases
Type II diabetes is one of the most common causes of disability and death in the United Kingdom. This investigation analysed data acquired from diabetic patients at the Ulster Hospital in Northern Ireland in terms of statistical descriptive indicators and missing values. Such data are noisy and incomplete. This paper reports a comprehensive missing data estimation analysis. Five missing value imputation methods were compared, including k-Nearest Neighbours (k-NN) and correlation-based estimation models. From this analysis it can be concluded that a feature-based correlation method known as EMImpute/spl I.bar/Columns is a promising approach to estimating missing values. Nevertheless, k-NN methods may be useful to provide relatively accurate estimations with lower error variability. These estimation techniques will support the implementation of supervised and unsupervised learning tools for coronary heart disease risk assessment, a major complication of diabetes.