用于微视频多标签分类的多模态深度分层语义对齐矩阵因式分解方法

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Fugui Fan , Yuting Su , Yun Liu , Peiguang Jing , Kaihua Qu , Yu Liu
{"title":"用于微视频多标签分类的多模态深度分层语义对齐矩阵因式分解方法","authors":"Fugui Fan ,&nbsp;Yuting Su ,&nbsp;Yun Liu ,&nbsp;Peiguang Jing ,&nbsp;Kaihua Qu ,&nbsp;Yu Liu","doi":"10.1016/j.ipm.2024.103798","DOIUrl":null,"url":null,"abstract":"<div><p>As one of the typical formats of prevalent user-generated content in social media platforms, micro-videos inherently incorporate multimodal characteristics associated with a group of label concepts. However, existing methods generally explore the consensus features aggregated from all modalities to train a final multi-label predictor, while overlooking fine-grained semantic dependencies between modality and label domains. To address this problem, we present a novel multimodal deep hierarchical semantic-aligned matrix factorization (DHSAMF) method, which is devoted to bridging the dual-domain semantic discrepancies and the inter-modal heterogeneity gap for solving the multi-label classification task of micro-videos. Specifically, we utilize deep matrix factorization to individually explore the hierarchical modality-specific representations. A series of semantic embeddings is introduced to facilitate latent semantic interactions between modality-specific representations and label features in a layerwise manner. To further improve the representation ability of each modality, we leverage underlying correlation structures among instances to adequately mine intra-modal complementary attributes, and maximize the inter-modal alignment by aggregating consensus attributes in an optimal permutation. The experimental results conducted on the MTSVRC and VidOR datasets have demonstrated that our DHSAMF outperforms other state-of-the-art methods by nearly 3% and 4% improvements in terms of the AP metric.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal deep hierarchical semantic-aligned matrix factorization method for micro-video multi-label classification\",\"authors\":\"Fugui Fan ,&nbsp;Yuting Su ,&nbsp;Yun Liu ,&nbsp;Peiguang Jing ,&nbsp;Kaihua Qu ,&nbsp;Yu Liu\",\"doi\":\"10.1016/j.ipm.2024.103798\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>As one of the typical formats of prevalent user-generated content in social media platforms, micro-videos inherently incorporate multimodal characteristics associated with a group of label concepts. However, existing methods generally explore the consensus features aggregated from all modalities to train a final multi-label predictor, while overlooking fine-grained semantic dependencies between modality and label domains. To address this problem, we present a novel multimodal deep hierarchical semantic-aligned matrix factorization (DHSAMF) method, which is devoted to bridging the dual-domain semantic discrepancies and the inter-modal heterogeneity gap for solving the multi-label classification task of micro-videos. Specifically, we utilize deep matrix factorization to individually explore the hierarchical modality-specific representations. A series of semantic embeddings is introduced to facilitate latent semantic interactions between modality-specific representations and label features in a layerwise manner. To further improve the representation ability of each modality, we leverage underlying correlation structures among instances to adequately mine intra-modal complementary attributes, and maximize the inter-modal alignment by aggregating consensus attributes in an optimal permutation. The experimental results conducted on the MTSVRC and VidOR datasets have demonstrated that our DHSAMF outperforms other state-of-the-art methods by nearly 3% and 4% improvements in terms of the AP metric.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001572\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001572","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

微视频作为社交媒体平台上流行的用户生成内容的典型格式之一,本身就包含了与一组标签概念相关的多模态特征。然而,现有的方法一般都是利用从所有模态中汇总的共识特征来训练最终的多标签预测器,却忽略了模态和标签域之间的细粒度语义依赖关系。针对这一问题,我们提出了一种新颖的多模态深度分层语义对齐矩阵因式分解(DHSAMF)方法,该方法致力于弥合双域语义差异和模态间异质性差距,以解决微视频的多标签分类任务。具体来说,我们利用深度矩阵因式分解来单独探索特定模态的分层表征。我们引入了一系列语义嵌入,以分层方式促进特定模态表征与标签特征之间的潜在语义交互。为了进一步提高每种模态的表征能力,我们利用实例之间的潜在相关结构来充分挖掘模态内的互补属性,并通过以最优排列方式聚合共识属性来最大限度地提高模态间的一致性。在 MTSVRC 和 VidOR 数据集上进行的实验结果表明,就 AP 指标而言,我们的 DHSAMF 优于其他最先进的方法,分别提高了近 3% 和 4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multimodal deep hierarchical semantic-aligned matrix factorization method for micro-video multi-label classification

As one of the typical formats of prevalent user-generated content in social media platforms, micro-videos inherently incorporate multimodal characteristics associated with a group of label concepts. However, existing methods generally explore the consensus features aggregated from all modalities to train a final multi-label predictor, while overlooking fine-grained semantic dependencies between modality and label domains. To address this problem, we present a novel multimodal deep hierarchical semantic-aligned matrix factorization (DHSAMF) method, which is devoted to bridging the dual-domain semantic discrepancies and the inter-modal heterogeneity gap for solving the multi-label classification task of micro-videos. Specifically, we utilize deep matrix factorization to individually explore the hierarchical modality-specific representations. A series of semantic embeddings is introduced to facilitate latent semantic interactions between modality-specific representations and label features in a layerwise manner. To further improve the representation ability of each modality, we leverage underlying correlation structures among instances to adequately mine intra-modal complementary attributes, and maximize the inter-modal alignment by aggregating consensus attributes in an optimal permutation. The experimental results conducted on the MTSVRC and VidOR datasets have demonstrated that our DHSAMF outperforms other state-of-the-art methods by nearly 3% and 4% improvements in terms of the AP metric.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信