Comparison and Analysis of Machine Learning Models to Predict Hotel Booking Cancellation

Yiying Chen, Chuhan Ding, Hanjie Ye, Yuchen Zhou
{"title":"Comparison and Analysis of Machine Learning Models to Predict Hotel Booking Cancellation","authors":"Yiying Chen, Chuhan Ding, Hanjie Ye, Yuchen Zhou","doi":"10.2991/aebmr.k.220307.225","DOIUrl":null,"url":null,"abstract":"Hotel booking cancellation prediction is crucial in conducting revenue and resource management for hotels. This paper provides three possible substitutes for the neural network including logistic regression, k -Nearest Neighbor ( k -NN), and CatBoost, whereas CatBoost, is the most suitable model for hotels to do the prediction. The advantages of them are effectiveness, high accuracy, and lower cost. The dataset used in this paper was adapted from Kaggle, a set of the booking data from two types of hotels (resort hotel and city hotel) in Portugal, and the corresponding customers’ information. We select some key variables as the predictor to train and test the prediction models based on three machine learning algorithms. After preprocessing the raw data, i.e., standardizing, dealing with missing data, recoding some variables, and scaling, we conduct the prediction and compare each model through three metrics (confusion matrix, accuracy score, and 1 F -score). The result indicates that CatBoost has the best performance in predicting hotel booking cancellation because it has the greatest number of correct prediction samples and the highest accuracy score. We focus on the efficiency and economy of doing cancellation prediction in the hospitality industry to form a basis for future revenue and resource management for hotels.","PeriodicalId":333050,"journal":{"name":"Proceedings of the 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2991/aebmr.k.220307.225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Hotel booking cancellation prediction is crucial in conducting revenue and resource management for hotels. This paper provides three possible substitutes for the neural network including logistic regression, k -Nearest Neighbor ( k -NN), and CatBoost, whereas CatBoost, is the most suitable model for hotels to do the prediction. The advantages of them are effectiveness, high accuracy, and lower cost. The dataset used in this paper was adapted from Kaggle, a set of the booking data from two types of hotels (resort hotel and city hotel) in Portugal, and the corresponding customers’ information. We select some key variables as the predictor to train and test the prediction models based on three machine learning algorithms. After preprocessing the raw data, i.e., standardizing, dealing with missing data, recoding some variables, and scaling, we conduct the prediction and compare each model through three metrics (confusion matrix, accuracy score, and 1 F -score). The result indicates that CatBoost has the best performance in predicting hotel booking cancellation because it has the greatest number of correct prediction samples and the highest accuracy score. We focus on the efficiency and economy of doing cancellation prediction in the hospitality industry to form a basis for future revenue and resource management for hotels.
预测酒店预订取消的机器学习模型的比较与分析
酒店预订取消预测对酒店进行收入和资源管理至关重要。本文提供了三种可能的神经网络替代品,包括逻辑回归、k -最近邻(k -NN)和CatBoost,而CatBoost是最适合酒店进行预测的模型。它们的优点是效率高、精度高、成本低。本文使用的数据集改编自Kaggle,这是一组来自葡萄牙两种类型的酒店(度假酒店和城市酒店)的预订数据,以及相应的客户信息。我们选择一些关键变量作为预测器,对基于三种机器学习算法的预测模型进行训练和测试。在对原始数据进行标准化、缺失数据处理、部分变量编码、缩放等预处理后,通过混淆矩阵(confusion matrix)、准确率评分(accuracy score)和1 F -score三个指标对各个模型进行预测和比较。结果表明,CatBoost在预测酒店取消预订方面表现最好,因为它拥有最多的正确预测样本和最高的准确率分数。我们专注于在酒店业进行取消预测的效率和经济性,为酒店未来的收入和资源管理奠定基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信