{"title":"使用逻辑回归和决策树预测生活在美国南部的人们的特征","authors":"R. Serban, Andrzej Kupraszewicz, Gongzhu Hu","doi":"10.1109/INDIN.2011.6034974","DOIUrl":null,"url":null,"abstract":"Analysis of social data is at the core of social studies and an important application area of data mining and knowledge discovery. One aspect of such social data analysis is based on demographic and/or economic data. In this paper, we apply data mining techniques to find the characteristics of people living in the south of USA. The data used in our study is the WAGE2 data set with 935 observations that has been used in some previous social study research. The software tool SAS Enterprise Miner was used to analyze the data, particularly the regression and decision tree models. The results of our analysis show that the decision tree model produced a better variable selection than the logistic regression model did to predict if a person is likely to live in the south than the logistic regression model, at least from the given data set.","PeriodicalId":378407,"journal":{"name":"2011 9th IEEE International Conference on Industrial Informatics","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Predicting the characteristics of people living in the South USA using logistic regression and decision tree\",\"authors\":\"R. Serban, Andrzej Kupraszewicz, Gongzhu Hu\",\"doi\":\"10.1109/INDIN.2011.6034974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of social data is at the core of social studies and an important application area of data mining and knowledge discovery. One aspect of such social data analysis is based on demographic and/or economic data. In this paper, we apply data mining techniques to find the characteristics of people living in the south of USA. The data used in our study is the WAGE2 data set with 935 observations that has been used in some previous social study research. The software tool SAS Enterprise Miner was used to analyze the data, particularly the regression and decision tree models. The results of our analysis show that the decision tree model produced a better variable selection than the logistic regression model did to predict if a person is likely to live in the south than the logistic regression model, at least from the given data set.\",\"PeriodicalId\":378407,\"journal\":{\"name\":\"2011 9th IEEE International Conference on Industrial Informatics\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 9th IEEE International Conference on Industrial Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDIN.2011.6034974\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 9th IEEE International Conference on Industrial Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN.2011.6034974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predicting the characteristics of people living in the South USA using logistic regression and decision tree
Analysis of social data is at the core of social studies and an important application area of data mining and knowledge discovery. One aspect of such social data analysis is based on demographic and/or economic data. In this paper, we apply data mining techniques to find the characteristics of people living in the south of USA. The data used in our study is the WAGE2 data set with 935 observations that has been used in some previous social study research. The software tool SAS Enterprise Miner was used to analyze the data, particularly the regression and decision tree models. The results of our analysis show that the decision tree model produced a better variable selection than the logistic regression model did to predict if a person is likely to live in the south than the logistic regression model, at least from the given data set.