{"title":"Perspectives of data mining in improving data collection processes in official statistics","authors":"M. Hudec, Jana Juriová","doi":"10.51936/rvlb1833","DOIUrl":null,"url":null,"abstract":"Statistical offices are crucial institutions for collecting data about various aspects of society. Nevertheless, data collection copes with nonresponse in surveys and problem of missing values. Therefore, efforts focused on increasing response rates and the estimation of missing values are topics which need continual improvement. The paper examines advantages of soft computing techniques on small-scale case studies related to reminder letters, respondents' classification and estimation of missing values. Fuzzy sets have membership degree valued in the [0, 1] interval which implies that similar entities could be similarly treated in reminders and with some restriction in imputation. Neural networks are suitable when the borders of classes are not easily definable and databases contain incomplete records. In such a case the neural network can identify the most similar class for each entity and this enables the imputation of missing values. Finally, the paper discusses an efficient way for design and implementation of tools in the cooperation among statistical institutes.","PeriodicalId":242585,"journal":{"name":"Advances in Methodology and Statistics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Methodology and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51936/rvlb1833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Statistical offices are crucial institutions for collecting data about various aspects of society. Nevertheless, data collection copes with nonresponse in surveys and problem of missing values. Therefore, efforts focused on increasing response rates and the estimation of missing values are topics which need continual improvement. The paper examines advantages of soft computing techniques on small-scale case studies related to reminder letters, respondents' classification and estimation of missing values. Fuzzy sets have membership degree valued in the [0, 1] interval which implies that similar entities could be similarly treated in reminders and with some restriction in imputation. Neural networks are suitable when the borders of classes are not easily definable and databases contain incomplete records. In such a case the neural network can identify the most similar class for each entity and this enables the imputation of missing values. Finally, the paper discusses an efficient way for design and implementation of tools in the cooperation among statistical institutes.