基于分割回归分析的房价预测

IF 0.6 4区 综合性期刊 Q3 MULTIDISCIPLINARY SCIENCES
Y. Goh, Y. Goh, Chun-Chieh Yip, K. Ng
{"title":"基于分割回归分析的房价预测","authors":"Y. Goh, Y. Goh, Chun-Chieh Yip, K. Ng","doi":"10.12982/cmjs.2022.102","DOIUrl":null,"url":null,"abstract":"Regression analysis is a statistical methodology to investigate the relationship between the dependent variable and the independent variables. In current era with the trend of big data, we might face some problems when performing statistical analysis for the massive volume of data. For example, the heavy burden of the computing load will cause the computation to be time consuming, the accuracy of the results might be affected in view of the vast volume of data. Hence, divided regression analysis is proposed to reduce the burden of the computing load. This approach performs subdivision of the dataset into several unique subsets, then the multiple linear regression is fi tted into each subset. The results obtained from each subset are then combined to obtain a divided regression model which is treated as the original overall dataset. The dataset used in this paper is KC Housesales Data, obtained from the Kaggle website. The dataset contains statistics information about the housing price, for example, size of lot, size of living area and selling price of the house. The goal of this paper is to predict the selling price of a house from the given attributes. The dataset is partitioned into fi ve subsets. Consequently, multiple linear regression is fi tted for each subset. Then, some model adequacy checking will be applied on the models. The test in determining the existence of multicollinearity in the models is rather important as well because the collinearity among the independent variables will affect the overall results. Hence, the variance infl ation factor (VIF) approach is used to determine the existence of multicollinearity. Finally, the divided regression model is obtained by combining results from all the subsets and the validity of divided regression model is verifi ed.","PeriodicalId":9884,"journal":{"name":"Chiang Mai Journal of Science","volume":"58 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Housing Price Prediction by Divided Regression Analysis\",\"authors\":\"Y. Goh, Y. Goh, Chun-Chieh Yip, K. Ng\",\"doi\":\"10.12982/cmjs.2022.102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regression analysis is a statistical methodology to investigate the relationship between the dependent variable and the independent variables. In current era with the trend of big data, we might face some problems when performing statistical analysis for the massive volume of data. For example, the heavy burden of the computing load will cause the computation to be time consuming, the accuracy of the results might be affected in view of the vast volume of data. Hence, divided regression analysis is proposed to reduce the burden of the computing load. This approach performs subdivision of the dataset into several unique subsets, then the multiple linear regression is fi tted into each subset. The results obtained from each subset are then combined to obtain a divided regression model which is treated as the original overall dataset. The dataset used in this paper is KC Housesales Data, obtained from the Kaggle website. The dataset contains statistics information about the housing price, for example, size of lot, size of living area and selling price of the house. The goal of this paper is to predict the selling price of a house from the given attributes. The dataset is partitioned into fi ve subsets. Consequently, multiple linear regression is fi tted for each subset. Then, some model adequacy checking will be applied on the models. The test in determining the existence of multicollinearity in the models is rather important as well because the collinearity among the independent variables will affect the overall results. Hence, the variance infl ation factor (VIF) approach is used to determine the existence of multicollinearity. Finally, the divided regression model is obtained by combining results from all the subsets and the validity of divided regression model is verifi ed.\",\"PeriodicalId\":9884,\"journal\":{\"name\":\"Chiang Mai Journal of Science\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chiang Mai Journal of Science\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.12982/cmjs.2022.102\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chiang Mai Journal of Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.12982/cmjs.2022.102","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

回归分析是研究因变量和自变量之间关系的一种统计方法。在大数据时代,我们在对海量数据进行统计分析时可能会遇到一些问题。例如,沉重的计算负担会导致计算耗时,庞大的数据量可能会影响结果的准确性。因此,提出了分割回归分析,以减轻计算负荷的负担。该方法将数据集细分为几个唯一的子集,然后将多元线性回归拟合到每个子集中。然后将从每个子集获得的结果组合起来,得到一个分割的回归模型,该模型被视为原始的整体数据集。本文使用的数据集为KC Housesales Data,数据来源于Kaggle网站。该数据集包含有关房价的统计信息,例如地块大小、居住面积大小和房屋售价。本文的目标是根据给定的属性来预测房屋的销售价格。数据集被划分为5个子集。因此,对每个子集进行多元线性回归拟合。然后,对模型进行充分性检验。确定模型中是否存在多重共线性的检验也很重要,因为自变量之间的共线性会影响整体结果。因此,采用方差膨胀因子(VIF)方法来确定是否存在多重共线性。最后,将各子集的结果组合得到划分回归模型,并对划分回归模型的有效性进行验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Housing Price Prediction by Divided Regression Analysis
Regression analysis is a statistical methodology to investigate the relationship between the dependent variable and the independent variables. In current era with the trend of big data, we might face some problems when performing statistical analysis for the massive volume of data. For example, the heavy burden of the computing load will cause the computation to be time consuming, the accuracy of the results might be affected in view of the vast volume of data. Hence, divided regression analysis is proposed to reduce the burden of the computing load. This approach performs subdivision of the dataset into several unique subsets, then the multiple linear regression is fi tted into each subset. The results obtained from each subset are then combined to obtain a divided regression model which is treated as the original overall dataset. The dataset used in this paper is KC Housesales Data, obtained from the Kaggle website. The dataset contains statistics information about the housing price, for example, size of lot, size of living area and selling price of the house. The goal of this paper is to predict the selling price of a house from the given attributes. The dataset is partitioned into fi ve subsets. Consequently, multiple linear regression is fi tted for each subset. Then, some model adequacy checking will be applied on the models. The test in determining the existence of multicollinearity in the models is rather important as well because the collinearity among the independent variables will affect the overall results. Hence, the variance infl ation factor (VIF) approach is used to determine the existence of multicollinearity. Finally, the divided regression model is obtained by combining results from all the subsets and the validity of divided regression model is verifi ed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Chiang Mai Journal of Science
Chiang Mai Journal of Science MULTIDISCIPLINARY SCIENCES-
CiteScore
1.00
自引率
25.00%
发文量
103
审稿时长
3 months
期刊介绍: The Chiang Mai Journal of Science is an international English language peer-reviewed journal which is published in open access electronic format 6 times a year in January, March, May, July, September and November by the Faculty of Science, Chiang Mai University. Manuscripts in most areas of science are welcomed except in areas such as agriculture, engineering and medical science which are outside the scope of the Journal. Currently, we focus on manuscripts in biology, chemistry, physics, materials science and environmental science. Papers in mathematics statistics and computer science are also included but should be of an applied nature rather than purely theoretical. Manuscripts describing experiments on humans or animals are required to provide proof that all experiments have been carried out according to the ethical regulations of the respective institutional and/or governmental authorities and this should be clearly stated in the manuscript itself. The Editor reserves the right to reject manuscripts that fail to do so.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信