{"title":"Housing Price Prediction by Divided Regression Analysis","authors":"Y. Goh, Y. Goh, Chun-Chieh Yip, K. Ng","doi":"10.12982/cmjs.2022.102","DOIUrl":null,"url":null,"abstract":"Regression analysis is a statistical methodology to investigate the relationship between the dependent variable and the independent variables. In current era with the trend of big data, we might face some problems when performing statistical analysis for the massive volume of data. For example, the heavy burden of the computing load will cause the computation to be time consuming, the accuracy of the results might be affected in view of the vast volume of data. Hence, divided regression analysis is proposed to reduce the burden of the computing load. This approach performs subdivision of the dataset into several unique subsets, then the multiple linear regression is fi tted into each subset. The results obtained from each subset are then combined to obtain a divided regression model which is treated as the original overall dataset. The dataset used in this paper is KC Housesales Data, obtained from the Kaggle website. The dataset contains statistics information about the housing price, for example, size of lot, size of living area and selling price of the house. The goal of this paper is to predict the selling price of a house from the given attributes. The dataset is partitioned into fi ve subsets. Consequently, multiple linear regression is fi tted for each subset. Then, some model adequacy checking will be applied on the models. The test in determining the existence of multicollinearity in the models is rather important as well because the collinearity among the independent variables will affect the overall results. Hence, the variance infl ation factor (VIF) approach is used to determine the existence of multicollinearity. Finally, the divided regression model is obtained by combining results from all the subsets and the validity of divided regression model is verifi ed.","PeriodicalId":9884,"journal":{"name":"Chiang Mai Journal of Science","volume":"58 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chiang Mai Journal of Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.12982/cmjs.2022.102","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Regression analysis is a statistical methodology to investigate the relationship between the dependent variable and the independent variables. In current era with the trend of big data, we might face some problems when performing statistical analysis for the massive volume of data. For example, the heavy burden of the computing load will cause the computation to be time consuming, the accuracy of the results might be affected in view of the vast volume of data. Hence, divided regression analysis is proposed to reduce the burden of the computing load. This approach performs subdivision of the dataset into several unique subsets, then the multiple linear regression is fi tted into each subset. The results obtained from each subset are then combined to obtain a divided regression model which is treated as the original overall dataset. The dataset used in this paper is KC Housesales Data, obtained from the Kaggle website. The dataset contains statistics information about the housing price, for example, size of lot, size of living area and selling price of the house. The goal of this paper is to predict the selling price of a house from the given attributes. The dataset is partitioned into fi ve subsets. Consequently, multiple linear regression is fi tted for each subset. Then, some model adequacy checking will be applied on the models. The test in determining the existence of multicollinearity in the models is rather important as well because the collinearity among the independent variables will affect the overall results. Hence, the variance infl ation factor (VIF) approach is used to determine the existence of multicollinearity. Finally, the divided regression model is obtained by combining results from all the subsets and the validity of divided regression model is verifi ed.
期刊介绍:
The Chiang Mai Journal of Science is an international English language peer-reviewed journal which is published in open access electronic format 6 times a year in January, March, May, July, September and November by the Faculty of Science, Chiang Mai University. Manuscripts in most areas of science are welcomed except in areas such as agriculture, engineering and medical science which are outside the scope of the Journal. Currently, we focus on manuscripts in biology, chemistry, physics, materials science and environmental science. Papers in mathematics statistics and computer science are also included but should be of an applied nature rather than purely theoretical. Manuscripts describing experiments on humans or animals are required to provide proof that all experiments have been carried out according to the ethical regulations of the respective institutional and/or governmental authorities and this should be clearly stated in the manuscript itself. The Editor reserves the right to reject manuscripts that fail to do so.