W. Coleman, Ben Johann, Nicholas Pasternak, Jaya Vellayan, N. Foutz, Heman Shakeri
{"title":"利用位置大数据利用机器学习评估房地产价格","authors":"W. Coleman, Ben Johann, Nicholas Pasternak, Jaya Vellayan, N. Foutz, Heman Shakeri","doi":"10.48550/arXiv.2205.01180","DOIUrl":null,"url":null,"abstract":"With everyone trying to enter the real estate market nowadays, knowing the proper valuations for residential and commercial properties has become crucial. Past researchers have been known to utilize static real estate data (e.g, number of beds, baths, square footage) or even a combination of real estate and demographic information to predict property prices. In this investigation, we attempted to improve upon past research. So we decided to explore a unique approach - we wanted to determine if mobile location data could be used to improve the predictive power of popular regression and tree-based models. To prepare our data for our models, we processed the mobility data by attaching it to individual properties from the real estate data that aggregated users within 500 meters of the property for each day of the week. We removed people that lived within 500 meters of each property, so each property's aggregated mobility data only contained non-resident census features. On top of these dynamic census features, we also included static census features, including the number of people in the area, the average proportion of people commuting, and the number of residents in the area. Finally, we tested multiple models to predict real estate prices. Our proposed model is two stacked random forest modules combined using a ridge regression that uses the random forest outputs as predictors. The first random forest model used static features only and the second random forest model used dynamic features only. Comparing our models with and without the dynamic mobile location features concludes the model with dynamic mobile location features achieves 3 % lower mean squared error than the same model but without dynamic mobile location features.","PeriodicalId":286724,"journal":{"name":"2022 Systems and Information Engineering Design Symposium (SIEDS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Machine Learning to Evaluate Real Estate Prices Using Location Big Data\",\"authors\":\"W. Coleman, Ben Johann, Nicholas Pasternak, Jaya Vellayan, N. Foutz, Heman Shakeri\",\"doi\":\"10.48550/arXiv.2205.01180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With everyone trying to enter the real estate market nowadays, knowing the proper valuations for residential and commercial properties has become crucial. Past researchers have been known to utilize static real estate data (e.g, number of beds, baths, square footage) or even a combination of real estate and demographic information to predict property prices. In this investigation, we attempted to improve upon past research. So we decided to explore a unique approach - we wanted to determine if mobile location data could be used to improve the predictive power of popular regression and tree-based models. To prepare our data for our models, we processed the mobility data by attaching it to individual properties from the real estate data that aggregated users within 500 meters of the property for each day of the week. We removed people that lived within 500 meters of each property, so each property's aggregated mobility data only contained non-resident census features. On top of these dynamic census features, we also included static census features, including the number of people in the area, the average proportion of people commuting, and the number of residents in the area. Finally, we tested multiple models to predict real estate prices. Our proposed model is two stacked random forest modules combined using a ridge regression that uses the random forest outputs as predictors. The first random forest model used static features only and the second random forest model used dynamic features only. Comparing our models with and without the dynamic mobile location features concludes the model with dynamic mobile location features achieves 3 % lower mean squared error than the same model but without dynamic mobile location features.\",\"PeriodicalId\":286724,\"journal\":{\"name\":\"2022 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.01180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.01180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Machine Learning to Evaluate Real Estate Prices Using Location Big Data
With everyone trying to enter the real estate market nowadays, knowing the proper valuations for residential and commercial properties has become crucial. Past researchers have been known to utilize static real estate data (e.g, number of beds, baths, square footage) or even a combination of real estate and demographic information to predict property prices. In this investigation, we attempted to improve upon past research. So we decided to explore a unique approach - we wanted to determine if mobile location data could be used to improve the predictive power of popular regression and tree-based models. To prepare our data for our models, we processed the mobility data by attaching it to individual properties from the real estate data that aggregated users within 500 meters of the property for each day of the week. We removed people that lived within 500 meters of each property, so each property's aggregated mobility data only contained non-resident census features. On top of these dynamic census features, we also included static census features, including the number of people in the area, the average proportion of people commuting, and the number of residents in the area. Finally, we tested multiple models to predict real estate prices. Our proposed model is two stacked random forest modules combined using a ridge regression that uses the random forest outputs as predictors. The first random forest model used static features only and the second random forest model used dynamic features only. Comparing our models with and without the dynamic mobile location features concludes the model with dynamic mobile location features achieves 3 % lower mean squared error than the same model but without dynamic mobile location features.