Zhiwen Pan, S. Zhao, J. Pacheco, Yu-xin Zhang, Xiaofan Song, Yiqiang Chen, L. Dai, Jun Zhang
{"title":"综合社会调查数据集的综合数据管理与分析","authors":"Zhiwen Pan, S. Zhao, J. Pacheco, Yu-xin Zhang, Xiaofan Song, Yiqiang Chen, L. Dai, Jun Zhang","doi":"10.1145/3371238.3371269","DOIUrl":null,"url":null,"abstract":"The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS dataset is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS dataset are designed by combining expert knowledges and simple statistics. In this paper, we proposed a comprehensive data management and data mining approach for GSS datasets. The approach is designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute preprocessing and filter-based attribute selection; a data mining phase which can extract hidden knowledges from the dataset by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. By leveraging the power of data mining techniques, our proposed approach can explore knowledges in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey dataset are conducted at the end to evaluate the performance of our approach.","PeriodicalId":241191,"journal":{"name":"Proceedings of the 4th International Conference on Crowd Science and Engineering","volume":"79 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensive Data Management and Analytics for General Society Survey Dataset\",\"authors\":\"Zhiwen Pan, S. Zhao, J. Pacheco, Yu-xin Zhang, Xiaofan Song, Yiqiang Chen, L. Dai, Jun Zhang\",\"doi\":\"10.1145/3371238.3371269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS dataset is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS dataset are designed by combining expert knowledges and simple statistics. In this paper, we proposed a comprehensive data management and data mining approach for GSS datasets. The approach is designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute preprocessing and filter-based attribute selection; a data mining phase which can extract hidden knowledges from the dataset by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. By leveraging the power of data mining techniques, our proposed approach can explore knowledges in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey dataset are conducted at the end to evaluate the performance of our approach.\",\"PeriodicalId\":241191,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Crowd Science and Engineering\",\"volume\":\"79 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Crowd Science and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3371238.3371269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Crowd Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3371238.3371269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comprehensive Data Management and Analytics for General Society Survey Dataset
The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS dataset is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS dataset are designed by combining expert knowledges and simple statistics. In this paper, we proposed a comprehensive data management and data mining approach for GSS datasets. The approach is designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute preprocessing and filter-based attribute selection; a data mining phase which can extract hidden knowledges from the dataset by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis. By leveraging the power of data mining techniques, our proposed approach can explore knowledges in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey dataset are conducted at the end to evaluate the performance of our approach.