Jinfeng Xu , Zheyu Chen , Wei Wang , Xiping Hu , Jiyi Liu , Edith C.H. Ngai
{"title":"LOBSTER: Bilateral global semantic enhancement for multimedia recommendation","authors":"Jinfeng Xu , Zheyu Chen , Wei Wang , Xiping Hu , Jiyi Liu , Edith C.H. Ngai","doi":"10.1016/j.inffus.2025.103778","DOIUrl":null,"url":null,"abstract":"<div><div>Multimedia information floods the Internet, subtly influencing human society. Combining multimedia information to alleviate the data sparsity problem is a popular way within the rapid development of recommender systems. However, many studies reveal that multimodal information can introduce cross-modality noise in some cases. A feasible solution to alleviate cross-modality noises is to enhance the common information among modalities. Recent advanced works enhance modality common information between users (via user-user graphs) or items (via item-item graphs) using extra homogeneous graphs. However, these additional homogeneous graph structures will inevitably bring huge computational costs. To better extract common information among modalities while reducing computational costs, we propose a bi<u>L</u>ateral gl<u>OB</u>al <u>S</u>eman<u>T</u>ic <u>E</u>nhancement for multimedia <u>R</u>ecommendation, which is called LOBSTER. Specifically, LOBSTER constructs two global semantic spaces for user and item representations, enhances global/common semantic features on both the user and item sides through additional learnable representations shared across multiple modalities. LOBSTER further incorporates a layer-refined Graph Convolutional Network (GCN) and a dynamic optimization to alleviate the over-smoothing problem and adjust attention levels for different modalities. Extensive experiments on three real-world datasets demonstrate that LOBSTER achieves competitive or superior performance compared to models incorporating homogeneous graphs, while providing an average 2.45<span><math><mo>×</mo></math></span> speedup and a 60.26 % reduction in memory usage. Our code is available at <span><span>https://github.com/Jinfeng-Xu/LOBSTER</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103778"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008401","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimedia information floods the Internet, subtly influencing human society. Combining multimedia information to alleviate the data sparsity problem is a popular way within the rapid development of recommender systems. However, many studies reveal that multimodal information can introduce cross-modality noise in some cases. A feasible solution to alleviate cross-modality noises is to enhance the common information among modalities. Recent advanced works enhance modality common information between users (via user-user graphs) or items (via item-item graphs) using extra homogeneous graphs. However, these additional homogeneous graph structures will inevitably bring huge computational costs. To better extract common information among modalities while reducing computational costs, we propose a biLateral glOBal SemanTic Enhancement for multimedia Recommendation, which is called LOBSTER. Specifically, LOBSTER constructs two global semantic spaces for user and item representations, enhances global/common semantic features on both the user and item sides through additional learnable representations shared across multiple modalities. LOBSTER further incorporates a layer-refined Graph Convolutional Network (GCN) and a dynamic optimization to alleviate the over-smoothing problem and adjust attention levels for different modalities. Extensive experiments on three real-world datasets demonstrate that LOBSTER achieves competitive or superior performance compared to models incorporating homogeneous graphs, while providing an average 2.45 speedup and a 60.26 % reduction in memory usage. Our code is available at https://github.com/Jinfeng-Xu/LOBSTER.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.