Multilabel Movie Genre Classification from Movie Subtitle: Parameter Optimized Hybrid Classifier

2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT) Pub Date : 2021-12-06 DOI:10.1109/ISAECT53699.2021.9668427

Md. Mehedi Hasan, Sadia Tamim Dip, Tasmiah Rahman, Mst Sonia Akter, Imrus Salehin

{"title":"Multilabel Movie Genre Classification from Movie Subtitle: Parameter Optimized Hybrid Classifier","authors":"Md. Mehedi Hasan, Sadia Tamim Dip, Tasmiah Rahman, Mst Sonia Akter, Imrus Salehin","doi":"10.1109/ISAECT53699.2021.9668427","DOIUrl":null,"url":null,"abstract":"Technological breakthroughs and the interest of business entities have made the categorization of media products gradually conventional in this digital environment. This is usually a multi-label scenario in which an object might be labeled with several categories. Most of the literature addresses the classification of movie genre as a mono-labeling task, generally based on audio-visual features. This study addressed a multilabel movie genre classification model using supervised machine learning techniques to classify the movies into their corresponding genres. The novelty of this work lies in its attempt to optimize the classifier and combine the classifier to make a hybrid classification system. The parameter optimized hybrid classification technique for multilabel movie genre classification has been proposed as a hybrid classification technique that combines SVM and DT. The performance of the classifiers is compared with respect to feature vectors with TF-IDF and BOW representation methods. Dimensionality has been reduced using the chi-square feature selection technique. For performance comparison, we measured the recall, precision and F1-measure for the classifiers. As a result, we recommend the parameter optimized hybrid classification technique because it shows high degree of accuracy regardless of the dataset and the feature vector. If we need to use traditional classifiers, we recommend KNN because it promises high accuracy after selecting the absolute value of parameter K. In order to use SVM, robust scaling will be needed to resolve unbalanced dataset. If we use DT, we need to use the N-gram practice to improve the accuracy.","PeriodicalId":137636,"journal":{"name":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","volume":"12 9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAECT53699.2021.9668427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Technological breakthroughs and the interest of business entities have made the categorization of media products gradually conventional in this digital environment. This is usually a multi-label scenario in which an object might be labeled with several categories. Most of the literature addresses the classification of movie genre as a mono-labeling task, generally based on audio-visual features. This study addressed a multilabel movie genre classification model using supervised machine learning techniques to classify the movies into their corresponding genres. The novelty of this work lies in its attempt to optimize the classifier and combine the classifier to make a hybrid classification system. The parameter optimized hybrid classification technique for multilabel movie genre classification has been proposed as a hybrid classification technique that combines SVM and DT. The performance of the classifiers is compared with respect to feature vectors with TF-IDF and BOW representation methods. Dimensionality has been reduced using the chi-square feature selection technique. For performance comparison, we measured the recall, precision and F1-measure for the classifiers. As a result, we recommend the parameter optimized hybrid classification technique because it shows high degree of accuracy regardless of the dataset and the feature vector. If we need to use traditional classifiers, we recommend KNN because it promises high accuracy after selecting the absolute value of parameter K. In order to use SVM, robust scaling will be needed to resolve unbalanced dataset. If we use DT, we need to use the N-gram practice to improve the accuracy.

查看原文本刊更多论文

基于电影字幕的多标签电影类型分类:参数优化混合分类器

技术的突破和商业实体的利益使得媒体产品的分类在这个数字环境中逐渐趋于常规。这通常是一个多标签场景，其中对象可能被标记为多个类别。大多数文献将电影类型分类作为单一标签任务，通常基于视听特征。本研究解决了一个多标签电影类型分类模型，使用监督机器学习技术将电影分类到相应的类型。这项工作的新颖之处在于它试图优化分类器，并将分类器组合成一个混合分类系统。多标签电影类型分类的参数优化混合分类技术是一种将支持向量机和DT相结合的混合分类技术。将分类器的性能与使用TF-IDF和BOW表示方法的特征向量进行比较。使用卡方特征选择技术降低了维数。为了进行性能比较，我们测量了分类器的召回率、精度和F1-measure。因此，我们推荐参数优化混合分类技术，因为无论数据集和特征向量如何，它都具有很高的准确率。如果我们需要使用传统的分类器，我们推荐KNN，因为它在选择参数k的绝对值后保证了很高的准确性。为了使用SVM，将需要鲁棒缩放来解决不平衡数据集。如果我们使用DT，我们需要使用N-gram练习来提高准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)

自引率

0.00%

发文量