Image-Text Integration Using a Multimodal Fusion Network Module for Movie Genre Classification

Leodécio Braz, Vinicius Teixeira, H. Pedrini, Z. Dias
{"title":"Image-Text Integration Using a Multimodal Fusion Network Module for Movie Genre Classification","authors":"Leodécio Braz, Vinicius Teixeira, H. Pedrini, Z. Dias","doi":"10.1049/icp.2021.1456","DOIUrl":null,"url":null,"abstract":"Multimodal models have received increasing attention from researchers for using the complementarity of data to obtain a better inference on the dataset. These multimodal models have been applied to several deep learning tasks, such as emotion recognition, video classification and audio-visual speech enhancement. In this paper, we propose a multimodal method that has two branches, one for text classification and another for image classification. In the image classification branch, we use the Class Activation Mapping (CAM) method as an attention module for the identification of relevant regions of the images. To validate our method, we used the MM-IMDB dataset, which consists of 25959 movies with their respective plot outlines, poster and genres. Our results showed that our method averaged 0:6749 in F1-Weight, 0:6734 in F1-Samples, 0:6750 in F1-Micro and 0:6159 in F1-Macro, achieving better results than the state of the art in the F1-Weight and F1-Macro metrics, and being the second best result in the F1-Samples and F1-Micro metrics.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/icp.2021.1456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Multimodal models have received increasing attention from researchers for using the complementarity of data to obtain a better inference on the dataset. These multimodal models have been applied to several deep learning tasks, such as emotion recognition, video classification and audio-visual speech enhancement. In this paper, we propose a multimodal method that has two branches, one for text classification and another for image classification. In the image classification branch, we use the Class Activation Mapping (CAM) method as an attention module for the identification of relevant regions of the images. To validate our method, we used the MM-IMDB dataset, which consists of 25959 movies with their respective plot outlines, poster and genres. Our results showed that our method averaged 0:6749 in F1-Weight, 0:6734 in F1-Samples, 0:6750 in F1-Micro and 0:6159 in F1-Macro, achieving better results than the state of the art in the F1-Weight and F1-Macro metrics, and being the second best result in the F1-Samples and F1-Micro metrics.
基于多模态融合网络模块的图像-文本集成电影类型分类
多模态模型利用数据的互补性对数据集进行更好的推理,越来越受到研究人员的关注。这些多模态模型已被应用于多个深度学习任务,如情感识别、视频分类和视听语音增强。在本文中,我们提出了一种多模态方法,它有两个分支,一个用于文本分类,另一个用于图像分类。在图像分类分支中,我们使用类激活映射(Class Activation Mapping, CAM)方法作为关注模块来识别图像的相关区域。为了验证我们的方法,我们使用了MM-IMDB数据集,该数据集包含25959部电影及其各自的情节大纲、海报和类型。结果表明,我们的方法在F1-Weight、F1-Samples、F1-Micro和F1-Macro的平均值分别为0:6749、0:6734、0:6750和0:6159,在F1-Weight和F1-Macro指标上取得了比现有方法更好的结果,在F1-Samples和F1-Micro指标上取得了第二好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信