{"title":"Multilabel Movie Genre Classification from Movie Subtitle: Parameter Optimized Hybrid Classifier","authors":"Md. Mehedi Hasan, Sadia Tamim Dip, Tasmiah Rahman, Mst Sonia Akter, Imrus Salehin","doi":"10.1109/ISAECT53699.2021.9668427","DOIUrl":null,"url":null,"abstract":"Technological breakthroughs and the interest of business entities have made the categorization of media products gradually conventional in this digital environment. This is usually a multi-label scenario in which an object might be labeled with several categories. Most of the literature addresses the classification of movie genre as a mono-labeling task, generally based on audio-visual features. This study addressed a multilabel movie genre classification model using supervised machine learning techniques to classify the movies into their corresponding genres. The novelty of this work lies in its attempt to optimize the classifier and combine the classifier to make a hybrid classification system. The parameter optimized hybrid classification technique for multilabel movie genre classification has been proposed as a hybrid classification technique that combines SVM and DT. The performance of the classifiers is compared with respect to feature vectors with TF-IDF and BOW representation methods. Dimensionality has been reduced using the chi-square feature selection technique. For performance comparison, we measured the recall, precision and F1-measure for the classifiers. As a result, we recommend the parameter optimized hybrid classification technique because it shows high degree of accuracy regardless of the dataset and the feature vector. If we need to use traditional classifiers, we recommend KNN because it promises high accuracy after selecting the absolute value of parameter K. In order to use SVM, robust scaling will be needed to resolve unbalanced dataset. If we use DT, we need to use the N-gram practice to improve the accuracy.","PeriodicalId":137636,"journal":{"name":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","volume":"12 9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAECT53699.2021.9668427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Technological breakthroughs and the interest of business entities have made the categorization of media products gradually conventional in this digital environment. This is usually a multi-label scenario in which an object might be labeled with several categories. Most of the literature addresses the classification of movie genre as a mono-labeling task, generally based on audio-visual features. This study addressed a multilabel movie genre classification model using supervised machine learning techniques to classify the movies into their corresponding genres. The novelty of this work lies in its attempt to optimize the classifier and combine the classifier to make a hybrid classification system. The parameter optimized hybrid classification technique for multilabel movie genre classification has been proposed as a hybrid classification technique that combines SVM and DT. The performance of the classifiers is compared with respect to feature vectors with TF-IDF and BOW representation methods. Dimensionality has been reduced using the chi-square feature selection technique. For performance comparison, we measured the recall, precision and F1-measure for the classifiers. As a result, we recommend the parameter optimized hybrid classification technique because it shows high degree of accuracy regardless of the dataset and the feature vector. If we need to use traditional classifiers, we recommend KNN because it promises high accuracy after selecting the absolute value of parameter K. In order to use SVM, robust scaling will be needed to resolve unbalanced dataset. If we use DT, we need to use the N-gram practice to improve the accuracy.