基于机器学习回归算法的访问时间预测

I. Hapsari, I. Surjandari, Komarudin, Reynaldi Ananda, Pane Mohamad, Syahrul Mubarok, Nanang, Mukti Ari, H. Murfi, Satrio Adi, N. Endro, Ariyanto Andrian, Rakhmatsyah, Aji Achmad, Indra Budi, Faisal Rahutomo, Rosa Andrie, Deddy Kusbianto, Purwoko Aji, Tedjo Darmanto, Fajar Hendra, Prabowo, M. Kemas, Lhaksmana Z. K Abdurahman, Baizal
{"title":"基于机器学习回归算法的访问时间预测","authors":"I. Hapsari, I. Surjandari, Komarudin, Reynaldi Ananda, Pane Mohamad, Syahrul Mubarok, Nanang, Mukti Ari, H. Murfi, Satrio Adi, N. Endro, Ariyanto Andrian, Rakhmatsyah, Aji Achmad, Indra Budi, Faisal Rahutomo, Rosa Andrie, Deddy Kusbianto, Purwoko Aji, Tedjo Darmanto, Fajar Hendra, Prabowo, M. Kemas, Lhaksmana Z. K Abdurahman, Baizal","doi":"10.1109/ICOICT.2018.8528810","DOIUrl":null,"url":null,"abstract":"Smart tourists cannot be separated with mobile technology. With the gadget, tourist can find information about the destination, or supporting information like transportation, hotel, weather and exchange rate. They need prediction of traveling and visiting time, to arrange their journey. If traveling time has predicted accurately by Google Map using the location feature, visiting time has another issue. Until today, Google detects the user's position based on crowdsourcing data from customer visits to a specific location over the last several weeks. It cannot be denied that this method will give a valid information for the tourists. However, because it needs a lot of data, there are many destinations that have no information about visiting time. From the case study that we used, there are 626 destinations in East Java, Indonesia, and from that amount only 224 destinations or 35.78% has the visiting time. To complete the information and help tourists, this research developed the prediction model for visiting time. For the first data is tested statistically to make sure the model development was using the right method. Multiple linear regression become the common model, because there are six factors that influenced the visiting time, i.e. access, government, rating, number of reviews, number of pictures, and other information. Those factors become the independent variables to predict dependent variable or visiting time. From normality test as the linear regression requirement, the significant value was less than p that means the data cannot pass the statistic test, even though we transformed the data based on the skewness. Because of three of them are ordinal data and the others are interval data, we tried to exclude and include the ordinal by transform it to interval. We also used the Ordinal Logistic Regression by transform the interval data in dependent variable into ordinal data using Expectation Maximization, one of clustering algorithm in machine learning, but the model still did not fit even though we used 5 functions. Then we used the classification algorithm in machine learning by using 5 top algorithm which are Linear Regression, k-Nearest Neighbors, Decision Tree, Support Vector Machines, and Multi-Layer Perceptron. Based on maximum correlation coefficient and minimum root mean square error, Linear Regression with 6 independent variables has the best result with the correlation coefficient 20.41% and root mean square error 48.46%. We also compared with model using 3 independent variable, the best algorithm was still the same but with less performance. Then, the model was loaded to predict the visiting time for other 402 destinations.","PeriodicalId":266335,"journal":{"name":"2018 6th International Conference on Information and Communication Technology (ICoICT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Visiting Time Prediction Using Machine Learning Regression Algorithm\",\"authors\":\"I. Hapsari, I. Surjandari, Komarudin, Reynaldi Ananda, Pane Mohamad, Syahrul Mubarok, Nanang, Mukti Ari, H. Murfi, Satrio Adi, N. Endro, Ariyanto Andrian, Rakhmatsyah, Aji Achmad, Indra Budi, Faisal Rahutomo, Rosa Andrie, Deddy Kusbianto, Purwoko Aji, Tedjo Darmanto, Fajar Hendra, Prabowo, M. Kemas, Lhaksmana Z. K Abdurahman, Baizal\",\"doi\":\"10.1109/ICOICT.2018.8528810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Smart tourists cannot be separated with mobile technology. With the gadget, tourist can find information about the destination, or supporting information like transportation, hotel, weather and exchange rate. They need prediction of traveling and visiting time, to arrange their journey. If traveling time has predicted accurately by Google Map using the location feature, visiting time has another issue. Until today, Google detects the user's position based on crowdsourcing data from customer visits to a specific location over the last several weeks. It cannot be denied that this method will give a valid information for the tourists. However, because it needs a lot of data, there are many destinations that have no information about visiting time. From the case study that we used, there are 626 destinations in East Java, Indonesia, and from that amount only 224 destinations or 35.78% has the visiting time. To complete the information and help tourists, this research developed the prediction model for visiting time. For the first data is tested statistically to make sure the model development was using the right method. Multiple linear regression become the common model, because there are six factors that influenced the visiting time, i.e. access, government, rating, number of reviews, number of pictures, and other information. Those factors become the independent variables to predict dependent variable or visiting time. From normality test as the linear regression requirement, the significant value was less than p that means the data cannot pass the statistic test, even though we transformed the data based on the skewness. Because of three of them are ordinal data and the others are interval data, we tried to exclude and include the ordinal by transform it to interval. We also used the Ordinal Logistic Regression by transform the interval data in dependent variable into ordinal data using Expectation Maximization, one of clustering algorithm in machine learning, but the model still did not fit even though we used 5 functions. Then we used the classification algorithm in machine learning by using 5 top algorithm which are Linear Regression, k-Nearest Neighbors, Decision Tree, Support Vector Machines, and Multi-Layer Perceptron. Based on maximum correlation coefficient and minimum root mean square error, Linear Regression with 6 independent variables has the best result with the correlation coefficient 20.41% and root mean square error 48.46%. We also compared with model using 3 independent variable, the best algorithm was still the same but with less performance. Then, the model was loaded to predict the visiting time for other 402 destinations.\",\"PeriodicalId\":266335,\"journal\":{\"name\":\"2018 6th International Conference on Information and Communication Technology (ICoICT)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 6th International Conference on Information and Communication Technology (ICoICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOICT.2018.8528810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 6th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2018.8528810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

智能游客离不开移动技术。有了这个小工具,游客可以找到关于目的地的信息,或者像交通、酒店、天气和汇率这样的辅助信息。他们需要预测旅行和访问时间,以便安排行程。如果谷歌地图使用位置功能准确地预测了旅行时间,那么访问时间就会出现另一个问题。直到今天,谷歌还会根据用户在过去几周内访问特定地点的众包数据来检测用户的位置。不可否认,这种方法将为游客提供有效的信息。然而,由于需要大量的数据,有很多目的地没有访问时间的信息。从我们使用的案例研究中,印度尼西亚东爪哇有626个目的地,其中只有224个目的地(35.78%)有访问时间。为了完善信息,帮助游客,本研究开发了旅游时间预测模型。对于第一个数据进行统计测试,以确保模型开发使用了正确的方法。多元线性回归成为常用的模型,因为影响访问时间的因素有六个,即访问次数、政府、评分、评论数、图片数和其他信息。这些因素成为预测因变量或访问时间的自变量。从正态性检验作为线性回归的要求来看,显著性值小于p,即数据不能通过统计检验,即使我们根据偏度对数据进行了变换。由于其中三个是序数数据,其余是区间数据,我们试图通过将序数转换为区间来排除和包含序数。我们还使用了Ordinal Logistic Regression,将因变量中的区间数据使用机器学习中的聚类算法之一Expectation Maximization转换为有序数据,但即使使用了5个函数,模型仍然不适合。然后利用线性回归、k近邻、决策树、支持向量机和多层感知机这5种顶级算法,将分类算法应用到机器学习中。在相关系数最大、均方根误差最小的情况下,6自变量线性回归的结果最好,相关系数为20.41%,均方根误差为48.46%。我们还比较了使用3个自变量的模型,最佳算法仍然相同,但性能较差。然后,将该模型加载到其他402个目的地的访问时间预测中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Visiting Time Prediction Using Machine Learning Regression Algorithm
Smart tourists cannot be separated with mobile technology. With the gadget, tourist can find information about the destination, or supporting information like transportation, hotel, weather and exchange rate. They need prediction of traveling and visiting time, to arrange their journey. If traveling time has predicted accurately by Google Map using the location feature, visiting time has another issue. Until today, Google detects the user's position based on crowdsourcing data from customer visits to a specific location over the last several weeks. It cannot be denied that this method will give a valid information for the tourists. However, because it needs a lot of data, there are many destinations that have no information about visiting time. From the case study that we used, there are 626 destinations in East Java, Indonesia, and from that amount only 224 destinations or 35.78% has the visiting time. To complete the information and help tourists, this research developed the prediction model for visiting time. For the first data is tested statistically to make sure the model development was using the right method. Multiple linear regression become the common model, because there are six factors that influenced the visiting time, i.e. access, government, rating, number of reviews, number of pictures, and other information. Those factors become the independent variables to predict dependent variable or visiting time. From normality test as the linear regression requirement, the significant value was less than p that means the data cannot pass the statistic test, even though we transformed the data based on the skewness. Because of three of them are ordinal data and the others are interval data, we tried to exclude and include the ordinal by transform it to interval. We also used the Ordinal Logistic Regression by transform the interval data in dependent variable into ordinal data using Expectation Maximization, one of clustering algorithm in machine learning, but the model still did not fit even though we used 5 functions. Then we used the classification algorithm in machine learning by using 5 top algorithm which are Linear Regression, k-Nearest Neighbors, Decision Tree, Support Vector Machines, and Multi-Layer Perceptron. Based on maximum correlation coefficient and minimum root mean square error, Linear Regression with 6 independent variables has the best result with the correlation coefficient 20.41% and root mean square error 48.46%. We also compared with model using 3 independent variable, the best algorithm was still the same but with less performance. Then, the model was loaded to predict the visiting time for other 402 destinations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信