综合社会调查数据集的综合数据管理与分析

Zhiwen Pan, S. Zhao, J. Pacheco, Yu-xin Zhang, Xiaofan Song, Yiqiang Chen, L. Dai, Jun Zhang
{"title":"综合社会调查数据集的综合数据管理与分析","authors":"Zhiwen Pan, S. Zhao, J. Pacheco, Yu-xin Zhang, Xiaofan Song, Yiqiang Chen, L. Dai, Jun Zhang","doi":"10.1145/3371238.3371269","DOIUrl":null,"url":null,"abstract":"The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS dataset is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS dataset are designed by combining expert knowledges and simple statistics. In this paper, we proposed a comprehensive data management and data mining approach for GSS datasets. The approach is designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute preprocessing and filter-based attribute selection; a data mining phase which can extract hidden knowledges from the dataset by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. By leveraging the power of data mining techniques, our proposed approach can explore knowledges in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey dataset are conducted at the end to evaluate the performance of our approach.","PeriodicalId":241191,"journal":{"name":"Proceedings of the 4th International Conference on Crowd Science and Engineering","volume":"79 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensive Data Management and Analytics for General Society Survey Dataset\",\"authors\":\"Zhiwen Pan, S. Zhao, J. Pacheco, Yu-xin Zhang, Xiaofan Song, Yiqiang Chen, L. Dai, Jun Zhang\",\"doi\":\"10.1145/3371238.3371269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS dataset is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS dataset are designed by combining expert knowledges and simple statistics. In this paper, we proposed a comprehensive data management and data mining approach for GSS datasets. The approach is designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute preprocessing and filter-based attribute selection; a data mining phase which can extract hidden knowledges from the dataset by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. By leveraging the power of data mining techniques, our proposed approach can explore knowledges in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey dataset are conducted at the end to evaluate the performance of our approach.\",\"PeriodicalId\":241191,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Crowd Science and Engineering\",\"volume\":\"79 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Crowd Science and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3371238.3371269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Crowd Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3371238.3371269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

综合社会调查(GSS)是一种政府资助的调查,旨在研究当代社会的社会经济地位、生活质量和结构。GSS数据集被认为是政府和组织从业者制定数据驱动政策的权威来源之一。以往的GSS数据分析方法都是将专家知识与简单统计相结合。本文提出了一种针对GSS数据集的综合数据管理和数据挖掘方法。该方法分为两阶段:数据管理阶段,通过执行属性预处理和基于过滤器的属性选择来提高GSS数据的质量;一个数据挖掘阶段,通过进行数据挖掘分析,包括预测分析、分类分析、关联分析和聚类分析,从数据集中提取隐藏的知识。通过利用数据挖掘技术的力量,我们提出的方法可以在最小的人为干扰下以细粒度的方式探索知识。最后在中国综合社会调查数据集上进行了实验,以评估我们的方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comprehensive Data Management and Analytics for General Society Survey Dataset
The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS dataset is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS dataset are designed by combining expert knowledges and simple statistics. In this paper, we proposed a comprehensive data management and data mining approach for GSS datasets. The approach is designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute preprocessing and filter-based attribute selection; a data mining phase which can extract hidden knowledges from the dataset by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. By leveraging the power of data mining techniques, our proposed approach can explore knowledges in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey dataset are conducted at the end to evaluate the performance of our approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信