{"title":"Machine learning applied to estate pricing for residential rentals in dynamic urban markets—The case of São Paulo city","authors":"Wesley F. Maia , Sergio A. David","doi":"10.1016/j.enganabound.2024.105988","DOIUrl":null,"url":null,"abstract":"<div><div>This study conducts a comprehensive investigation into real estate rental pricing in São Paulo city, employing an innovative approach that combines advanced machine learning techniques with geospatial and natural language processing (NLP) analyses. The research analyzed a robust dataset comprising 47,243 rental listings, gathered through web scraping techniques. Following a rigorous data cleaning and preprocessing procedure, the study focused on 35,486 instances, incorporating a variety of variables that go beyond conventional metrics, including textual descriptions and geographic information, enriching the analysis and market understanding. Several regression models were implemented and compared, including linear approaches, Support Vector Machines, and ensemble methods such as Gradient Boosting, LightGBM, and XGBoost. The Blending model, which integrates multiple modeling techniques, stood out as the most accurate, achieving a Root Mean Squared Logarithmic Error (RMSLE) of 0.2923 on the test set. This result emphasizes the superiority of hybrid modeling strategies in complex pricing tasks. The findings of this study have significant practical implications. They provide landlords and tenants with a powerful data-driven tool for informed decision-making, reflecting the nuances and complexity of São Paulo’s real estate market. The practical implementation of the model in an interactive web application not only demonstrates its utility in the real-world scenario but also serves as a model for future applications in real estate analysis. This work contributes to mitigating the waste of time and energy when it comes to searching for and pricing residential rentals in a large city, through the use of machine learning that shows its power and potential in accurately estimating rental prices in dynamic urban markets, allowing that more assertive and economical decisions can be taken within a social-sustainable-technological perspective.</div></div>","PeriodicalId":51039,"journal":{"name":"Engineering Analysis with Boundary Elements","volume":"169 ","pages":"Article 105988"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Analysis with Boundary Elements","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0955799724004612","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
This study conducts a comprehensive investigation into real estate rental pricing in São Paulo city, employing an innovative approach that combines advanced machine learning techniques with geospatial and natural language processing (NLP) analyses. The research analyzed a robust dataset comprising 47,243 rental listings, gathered through web scraping techniques. Following a rigorous data cleaning and preprocessing procedure, the study focused on 35,486 instances, incorporating a variety of variables that go beyond conventional metrics, including textual descriptions and geographic information, enriching the analysis and market understanding. Several regression models were implemented and compared, including linear approaches, Support Vector Machines, and ensemble methods such as Gradient Boosting, LightGBM, and XGBoost. The Blending model, which integrates multiple modeling techniques, stood out as the most accurate, achieving a Root Mean Squared Logarithmic Error (RMSLE) of 0.2923 on the test set. This result emphasizes the superiority of hybrid modeling strategies in complex pricing tasks. The findings of this study have significant practical implications. They provide landlords and tenants with a powerful data-driven tool for informed decision-making, reflecting the nuances and complexity of São Paulo’s real estate market. The practical implementation of the model in an interactive web application not only demonstrates its utility in the real-world scenario but also serves as a model for future applications in real estate analysis. This work contributes to mitigating the waste of time and energy when it comes to searching for and pricing residential rentals in a large city, through the use of machine learning that shows its power and potential in accurately estimating rental prices in dynamic urban markets, allowing that more assertive and economical decisions can be taken within a social-sustainable-technological perspective.
期刊介绍:
This journal is specifically dedicated to the dissemination of the latest developments of new engineering analysis techniques using boundary elements and other mesh reduction methods.
Boundary element (BEM) and mesh reduction methods (MRM) are very active areas of research with the techniques being applied to solve increasingly complex problems. The journal stresses the importance of these applications as well as their computational aspects, reliability and robustness.
The main criteria for publication will be the originality of the work being reported, its potential usefulness and applications of the methods to new fields.
In addition to regular issues, the journal publishes a series of special issues dealing with specific areas of current research.
The journal has, for many years, provided a channel of communication between academics and industrial researchers working in mesh reduction methods
Fields Covered:
• Boundary Element Methods (BEM)
• Mesh Reduction Methods (MRM)
• Meshless Methods
• Integral Equations
• Applications of BEM/MRM in Engineering
• Numerical Methods related to BEM/MRM
• Computational Techniques
• Combination of Different Methods
• Advanced Formulations.