Pakom Walanaraya, Weerapat Puengpipattrakul, D. Sutivong
{"title":"Movie Revenue Prediction Using Regression and Clustering","authors":"Pakom Walanaraya, Weerapat Puengpipattrakul, D. Sutivong","doi":"10.1109/ICEI18.2018.8448610","DOIUrl":null,"url":null,"abstract":"Among many movies that have been released, some generate high profit while the others do not. This paper studies the relationship between movie factors and its revenue and build prediction models. Besides analysis on aggregate data, we also divide data into groups using different methods and compare accuracy across these techniques as well as explore whether clustering techniques could help improve accuracy. Specifically, two major steps were employed. Initially, linear regression, polynomial regression and support vector regression (SVR) were applied on the entire movie data to predict the movie revenue. Then, clustering techniques, such as by genre, using Expectation Maximization (EM) and using K-means were applied to divide data into groups before regression analyses are executed. To compare accuracy among different techniques, R-square and the root-mean-square error (RMSE) were used as a performance indicator. Our study shows that generally linear regression without clustering offers the model with the highest R-square, while linear regression with EM clustering yields the lowest RMSE.","PeriodicalId":333863,"journal":{"name":"2018 2nd International Conference on Engineering Innovation (ICEI)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 2nd International Conference on Engineering Innovation (ICEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEI18.2018.8448610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Among many movies that have been released, some generate high profit while the others do not. This paper studies the relationship between movie factors and its revenue and build prediction models. Besides analysis on aggregate data, we also divide data into groups using different methods and compare accuracy across these techniques as well as explore whether clustering techniques could help improve accuracy. Specifically, two major steps were employed. Initially, linear regression, polynomial regression and support vector regression (SVR) were applied on the entire movie data to predict the movie revenue. Then, clustering techniques, such as by genre, using Expectation Maximization (EM) and using K-means were applied to divide data into groups before regression analyses are executed. To compare accuracy among different techniques, R-square and the root-mean-square error (RMSE) were used as a performance indicator. Our study shows that generally linear regression without clustering offers the model with the highest R-square, while linear regression with EM clustering yields the lowest RMSE.