Haiqin Li , Yuhan Yang , Jun Zeng , Min Gao , Junhao Wen
{"title":"Multi-Scale Transformers with dual attention and adaptive masking for sequential recommendation","authors":"Haiqin Li , Yuhan Yang , Jun Zeng , Min Gao , Junhao Wen","doi":"10.1016/j.ipm.2025.104318","DOIUrl":null,"url":null,"abstract":"<div><div>Sequential recommendation focuses on modeling and predicting a user’s next actions based on their sequential behavior patterns, using the temporal order and dynamics of user actions to provide more personalized and contextual suggestions. Sequential recommendation models rely on limited temporal scales, making it challenging to explicitly capture diverse user behaviors spanning multiple scales. Motivated by this challenge, this paper introduces ScaleRec, an advanced Multi-Scale Transformer architecture augmented with dual attention mechanisms and adaptive masking for sequential recommendation. ScaleRec integrates interaction granularity and context through multi-scale division, segmenting user behavior sequences into patches of varying lengths. Dual attention explicitly models fine-grained interests and coarse-grained preferences, including intra-patch cross-attention and inter-patch self-attention. Specifically, intra-patch cross-attention employs a learnable Gaussian kernel to introduce locality-based inductive biases, capturing fine-grained behavioral dynamics. The inter-patch self-attention is further enhanced by a Context-adaptive Preferences Aggregator, which dynamically selects and integrates relevant long-term user preferences. Additionally, we introduce an adaptive masking fusion strategy to filter redundant information dynamically. Extensive experiments on six benchmark datasets show that ScaleRec achieves state-of-the-art performance, improving the recommendation performance by up to 24.95% in terms of HR@5. The code of the proposed model is available at: <span><span>https://github.com/gangtann/ScaleRec</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104318"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002596","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Sequential recommendation focuses on modeling and predicting a user’s next actions based on their sequential behavior patterns, using the temporal order and dynamics of user actions to provide more personalized and contextual suggestions. Sequential recommendation models rely on limited temporal scales, making it challenging to explicitly capture diverse user behaviors spanning multiple scales. Motivated by this challenge, this paper introduces ScaleRec, an advanced Multi-Scale Transformer architecture augmented with dual attention mechanisms and adaptive masking for sequential recommendation. ScaleRec integrates interaction granularity and context through multi-scale division, segmenting user behavior sequences into patches of varying lengths. Dual attention explicitly models fine-grained interests and coarse-grained preferences, including intra-patch cross-attention and inter-patch self-attention. Specifically, intra-patch cross-attention employs a learnable Gaussian kernel to introduce locality-based inductive biases, capturing fine-grained behavioral dynamics. The inter-patch self-attention is further enhanced by a Context-adaptive Preferences Aggregator, which dynamically selects and integrates relevant long-term user preferences. Additionally, we introduce an adaptive masking fusion strategy to filter redundant information dynamically. Extensive experiments on six benchmark datasets show that ScaleRec achieves state-of-the-art performance, improving the recommendation performance by up to 24.95% in terms of HR@5. The code of the proposed model is available at: https://github.com/gangtann/ScaleRec.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.