用深度学习预测电视剧的IMDb评分:以《绿箭侠》为例

Anna Luiza Gomes, Getúlio Vianna, Tatiana Escovedo, Marcos Kalinowski
{"title":"用深度学习预测电视剧的IMDb评分:以《绿箭侠》为例","authors":"Anna Luiza Gomes, Getúlio Vianna, Tatiana Escovedo, Marcos Kalinowski","doi":"10.1145/3535511.3535520","DOIUrl":null,"url":null,"abstract":"Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. IS Theory: This study was developed under the aegis of Computational Learning Theory, which aims to understand the fundamental principles of learning and contribute to designing better-automated learning methods applied to the entertainment business. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55. Contributions and Impact in the IS area: The results show that deep learning techniques can be applied to support decisions in the entertainment field, allowing facilitating the decisions of renewing or starting a show. The rationale for building the model is detailed throughout the paper and can be replicated for other contexts.","PeriodicalId":106528,"journal":{"name":"Proceedings of the XVIII Brazilian Symposium on Information Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting IMDb Rating of TV Series with Deep Learning: The Case of Arrow\",\"authors\":\"Anna Luiza Gomes, Getúlio Vianna, Tatiana Escovedo, Marcos Kalinowski\",\"doi\":\"10.1145/3535511.3535520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. IS Theory: This study was developed under the aegis of Computational Learning Theory, which aims to understand the fundamental principles of learning and contribute to designing better-automated learning methods applied to the entertainment business. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55. Contributions and Impact in the IS area: The results show that deep learning techniques can be applied to support decisions in the entertainment field, allowing facilitating the decisions of renewing or starting a show. The rationale for building the model is detailed throughout the paper and can be replicated for other contexts.\",\"PeriodicalId\":106528,\"journal\":{\"name\":\"Proceedings of the XVIII Brazilian Symposium on Information Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the XVIII Brazilian Symposium on Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3535511.3535520\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XVIII Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535511.3535520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:现在提供的电视剧数量非常多。由于数量庞大,很多电视剧因为缺乏原创性,收视率低而被取消。问题:有一个决策支持系统,可以显示为什么一些节目是巨大的成功或不成功,这将有助于选择续订或开始一个节目。解决方案:我们以CW电视网播出的美剧《绿箭侠》为例,使用描述和预测建模技术来预测IMDb评分。我们假设剧集的主题会影响用户对剧集的评价,因此数据集仅由剧集的导演、剧集获得的评论数量、剧集的潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)模型提取的每个主题的百分比、维基百科的观众数量和IMDb的评分组成。LDA模型是由单词组成的文档集合的生成概率模型。IS理论:这项研究是在计算学习理论的支持下进行的,旨在理解学习的基本原理,并有助于设计更好的自动化学习方法,应用于娱乐业务。方法:采用个案研究法,对研究结果进行定量分析。结果总结:对于每一集的特征,Catboost模型在预测评分方面表现最好,因为它的均方误差与KNN模型相似,但在测试阶段有更好的标准差。用0.55的均方根误差来预测IMDb评分是可以接受的。在IS领域的贡献和影响:结果表明,深度学习技术可以应用于支持娱乐领域的决策,允许促进更新或开始表演的决策。本文详细介绍了构建模型的基本原理,并且可以复制到其他环境中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting IMDb Rating of TV Series with Deep Learning: The Case of Arrow
Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. IS Theory: This study was developed under the aegis of Computational Learning Theory, which aims to understand the fundamental principles of learning and contribute to designing better-automated learning methods applied to the entertainment business. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55. Contributions and Impact in the IS area: The results show that deep learning techniques can be applied to support decisions in the entertainment field, allowing facilitating the decisions of renewing or starting a show. The rationale for building the model is detailed throughout the paper and can be replicated for other contexts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信