Knowledge-Based Systems最新文献

筛选
英文 中文
From words to visuals: Bridging text and visual insights using MetA-MARC framework for enhanced scholarly article categorization 从文字到视觉:使用MetA-MARC框架桥接文本和视觉见解,以增强学术文章分类
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-14 DOI: 10.1016/j.knosys.2025.113896
Abhijit Mitra, Jayanta Paul, Tanis Ahamed, Sagar Basak, Jaya Sil
{"title":"From words to visuals: Bridging text and visual insights using MetA-MARC framework for enhanced scholarly article categorization","authors":"Abhijit Mitra,&nbsp;Jayanta Paul,&nbsp;Tanis Ahamed,&nbsp;Sagar Basak,&nbsp;Jaya Sil","doi":"10.1016/j.knosys.2025.113896","DOIUrl":"10.1016/j.knosys.2025.113896","url":null,"abstract":"<div><div>The rapid growth of technology has led to approximately 28,100 journals disseminating 2.5 million research articles annually, posing significant challenges in locating and categorizing articles of interest. Search engines, citation indexes, and digital libraries often return predominantly irrelevant papers due to limited indexing. Existing classification techniques leveraging content and metadata face challenges such as incomplete data and lack of semantic context. Metadata-based methods frequently rely on statistical metrics that neglect semantic meanings and require subject expertise for threshold setting. To address these issues, we propose <span>Metadata-Driven Attention-Based Multimodal Academic Research Classifier (MetA-MARC)</span>, a framework leveraging the pretrained CLIP model to integrate text and image modalities for enhanced scholarly article classification. <span>MetA-MARC</span> captures semantic and contextual meaning by integrating metadata, OCR-extracted features, and images through CLIP (Contrastive Language-Image Pre-training). It introduces a novel textual inversion approach to map images to pseudo-word tokens in the CLIP embedding space for robust multimodal representations. The framework employs <span>FusionWeave</span>, a multimodal fusion network combining features using concatenation, cross fusion, and attention-based techniques, alongside <span>Modality-Driven Adaptive Re-weighting (MoDAR)</span> to dynamically prioritize relevant features. Experiments on JUCS, ACM, and proprietary <span>CompScholar</span> datasets demonstrate average accuracies of 0.86, 0.84, and 0.8848, respectively, surpassing state-of-the-art methods by up to 4.05%. These results highlight <span>MetA-MARC’s</span> potential as a robust, adaptive tool for automated scholarly article classification, effectively bridging text and visual modalities.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113896"},"PeriodicalIF":7.2,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144306199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing math reasoning ability of large language models via computation logic graphs 通过计算逻辑图提高大型语言模型的数学推理能力
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-14 DOI: 10.1016/j.knosys.2025.113905
Deji Zhao , Donghong Han , Jia Wu , Zhongjiang He , Bo Ning , Ye Yuan , Yongxiang Li , Chao Wang , Shuangyong Song
{"title":"Enhancing math reasoning ability of large language models via computation logic graphs","authors":"Deji Zhao ,&nbsp;Donghong Han ,&nbsp;Jia Wu ,&nbsp;Zhongjiang He ,&nbsp;Bo Ning ,&nbsp;Ye Yuan ,&nbsp;Yongxiang Li ,&nbsp;Chao Wang ,&nbsp;Shuangyong Song","doi":"10.1016/j.knosys.2025.113905","DOIUrl":"10.1016/j.knosys.2025.113905","url":null,"abstract":"<div><div>The reasoning capabilities of large language models (LLMs) are essential for a wide range of tasks, particularly in the domain of mathematical reasoning. Common chain of thought methods perform well in handling simple reasoning problems, but for complex problems, a single-dimensional chain of thought is inadequate to address multi-layered logical relationships. To tackle this challenge, this paper introduces the concept of a Computation Logic Graph (CLG), designed to enhance the logical reasoning abilities of LLMs when solving complex mathematical problems. The CLG decomposes complex mathematical problems into multiple simple intermediate computational units, and the final answer is obtained through multiple iterations of these units. On the one hand, the CLG improves the model’s ability to decompose and solve complex mathematical problems step-by-step from a global perspective. On the other hand, the local inference process within the CLG helps enhance the model’s accuracy in single step calculations. To develop models with the ability to construct Computation Logic Graphs automatically, we create a dataset of computational logic graphs for complex mathematical problems, called the Computation-intensive Math Logic Graph (CMLG) dataset. We fine-tune several open-source LLMs using the CMLG dataset. Experimental results demonstrate that the proposed CLG method significantly enhances the performance of LLMs in complex mathematical reasoning tasks, outperforming on both the CMLG dataset and six other publicly available datasets from diverse domains.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"325 ","pages":"Article 113905"},"PeriodicalIF":7.2,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144365463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIDNet: Cross-Scale Interference Mining Detection Network for underwater object detection CIDNet:用于水下目标探测的跨尺度干扰探测网络
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-14 DOI: 10.1016/j.knosys.2025.113902
Gaoli Zhao , Kefei Zhang , Liangzhi Wang , Wenyi Zhao , Weidong Zhang
{"title":"CIDNet: Cross-Scale Interference Mining Detection Network for underwater object detection","authors":"Gaoli Zhao ,&nbsp;Kefei Zhang ,&nbsp;Liangzhi Wang ,&nbsp;Wenyi Zhao ,&nbsp;Weidong Zhang","doi":"10.1016/j.knosys.2025.113902","DOIUrl":"10.1016/j.knosys.2025.113902","url":null,"abstract":"<div><div>Underwater object detection plays a crucial role in advancing marine economics, protecting the environment, and promoting the planet’s sustainable development. Compared to land-based scenes, underwater object detection is often hindered by color deviation and low visibility. To effectively address these interference issues, we propose a Cross-Scale Interference Mining Detection Network (CIDNet). We first extract multidimensional feature representations from the input images using a standard residual network backbone, which uses a deep structure and residual connectivity mechanism. We then refine these features through interference mining and cross-scale feature fusion strategies, and further enhance feature hierarchy levels using adaptive feature mapping optimization. In addition, we introduce three-dimensional convolution combination with a channel dimension unification strategy to enhance the fine-grained representation of hierarchical feature layers. Finally, the refined features are fed into a Task-aligned detection head module, which improves the detection accuracy by optimizing a collaboration between classification and localization tasks through a task-aligned learning strategy. Extensive experiments conducted on the DUO and COCO datasets demonstrate that our method effectively detects hidden objects in realistic underwater scenes and significantly outperforms current state-of-the-art methods in terms of accuracy. The codes and model weights will be available at <span><span>https://www.researchgate.net/publication/390270613_CIDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113902"},"PeriodicalIF":7.2,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144298746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient mining of incremental high utility patterns with negative unit profits over all the accumulated stream data 在所有累积的流数据上有效挖掘具有负单位利润的增量高效用模式
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-13 DOI: 10.1016/j.knosys.2025.113956
Doyoung Kim , Heonho Kim , Seungwan Park , Hanju Kim , Myungha Cho , Seongbin Park , Taewoong Ryu, Chanhee Lee, Hyeonmo Kim, Unil Yun
{"title":"Efficient mining of incremental high utility patterns with negative unit profits over all the accumulated stream data","authors":"Doyoung Kim ,&nbsp;Heonho Kim ,&nbsp;Seungwan Park ,&nbsp;Hanju Kim ,&nbsp;Myungha Cho ,&nbsp;Seongbin Park ,&nbsp;Taewoong Ryu,&nbsp;Chanhee Lee,&nbsp;Hyeonmo Kim,&nbsp;Unil Yun","doi":"10.1016/j.knosys.2025.113956","DOIUrl":"10.1016/j.knosys.2025.113956","url":null,"abstract":"<div><div>Traditional high utility pattern mining had considered that items in databases have positive unit profits, but considering negative unit profits is often required in real life. Thus, many algorithms considering both positive and negative unit profits have been proposed in static data environments. Meanwhile, one of the most important parts of data analysis is how to handle the accumulated stream data in real-world systems. However, existing methods considering negative unit profits in a static environment are inadequate for processing data streams, as they require repeated data access, incurring additional resources with multiple data scans. This paper suggests an effective method considering positive and negative unit profits and dynamic databases for high utility stream pattern mining. To avoid storing data in memory and scanning it multiple times, the proposed approach constructs its data structure by performing a single scan of the incremental data without storing it in the memory. Then, through a reconstruction process, it efficiently integrates and manages the new data while optimally maintaining the structures. This methodology enables efficient mining without the loss of significant patterns. Experiments with real and synthetic datasets show that the proposed approach has improved performance to state-of-the-art methods, including adjusted approaches, regarding runtime, memory usage, and scalability. In addition, the proposed method demonstrates enhanced performance than the baseline method in terms of the resources of each process and the number of incremental databases. Further statistical evaluation of the accuracy test shows that the proposed method extracts results without pattern loss or duplication.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"325 ","pages":"Article 113956"},"PeriodicalIF":7.2,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144329721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal representation fusion method for dense video captioning 密集视频字幕的多模态表示融合方法
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-13 DOI: 10.1016/j.knosys.2025.113856
Haojie Fang , Yonggang Li , Yingjian Li
{"title":"Multimodal representation fusion method for dense video captioning","authors":"Haojie Fang ,&nbsp;Yonggang Li ,&nbsp;Yingjian Li","doi":"10.1016/j.knosys.2025.113856","DOIUrl":"10.1016/j.knosys.2025.113856","url":null,"abstract":"<div><div>Dense video captioning aims to locate multiple events from untrimmed videos and generate corresponding captions for each meaningful event. The application of multimodal information(e.g., video, audio) for dense video captioning has recently achieved great success. However, learning the information interactions between different modalities while achieving cross-modal feature alignment is highly challenging for an encoder. Recent studies of several multimodal tasks have shown that multimodal models benefit from shared and individual representations. Thus, in this paper, we propose a novel feature fusion module, which uses shared and individual modality representations to capture commonalities and complementary relationships between modalities. Moreover, the proposed model bridges the gap between shared modality representations, which helps to obtain deeper cross-modal associations for better feature interaction and alignment. Furthermore, to compensate for the limitation that different level proposal heads do not interact sufficiently during event detection, we propose a multilevel information interaction mechanism to dynamically adjust and fuse the information among different level proposal heads in the event detection module. Based on the ActivityNet Captions, subdatasets of ActivityNet Captions and YouCook2, we conducted comprehensive experiments to evaluate the performance of our proposed model. The experimental results show that our model achieves impressive performance compared with state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113856"},"PeriodicalIF":7.2,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144298750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PredVSD: Video saliency prediction based on conditional diffusion model PredVSD:基于条件扩散模型的视频显著性预测
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-13 DOI: 10.1016/j.knosys.2025.113820
Chenming Li , Shiguang Liu
{"title":"PredVSD: Video saliency prediction based on conditional diffusion model","authors":"Chenming Li ,&nbsp;Shiguang Liu","doi":"10.1016/j.knosys.2025.113820","DOIUrl":"10.1016/j.knosys.2025.113820","url":null,"abstract":"<div><div>Mainstream deep learning methods for video saliency prediction often use 3D CNNs or Vision Transformers as encoder–decoders, relying on task-specific loss functions to implicitly map input frames to saliency maps. However, these methods are limited by their capacity for salient feature expression. In this study, inspired by the recent advances of diffusion models in video processing tasks, we propose a Conditional Diffusion Model for Video Saliency Prediction (PredVSD), which leverages semantic video features and saliency-specific encodings as conditions to capture more representative saliency features from the target data distribution. To effectively integrate multi-scale visual features and saliency priors, we design an auxiliary network, Saliency-PyramidU-Net, allowing the denoising process to focus more on salient regions across the spatial–temporal plane. Extensive experiments confirm PredVSD’s strong performance across visual and audio-visual datasets.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113820"},"PeriodicalIF":7.2,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144298747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ELAFormer: Early Local Attention in multi-scale vision transFormers ELAFormer:多尺度视觉变压器的早期局部关注
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-13 DOI: 10.1016/j.knosys.2025.113851
Xin Zhou, Zhaohui Ren, Yongchao Zhang, Zeyu Jiang, Tianzhuang Yu, Hengfa Luo, Shihua Zhou
{"title":"ELAFormer: Early Local Attention in multi-scale vision transFormers","authors":"Xin Zhou,&nbsp;Zhaohui Ren,&nbsp;Yongchao Zhang,&nbsp;Zeyu Jiang,&nbsp;Tianzhuang Yu,&nbsp;Hengfa Luo,&nbsp;Shihua Zhou","doi":"10.1016/j.knosys.2025.113851","DOIUrl":"10.1016/j.knosys.2025.113851","url":null,"abstract":"<div><div>Vision Transformers have demonstrated remarkable success in vision tasks and have shown great potential when compared to CNN-based models. However, Transformers tend to prioritize the global context and overlook the local features between patches. Recent studies suggest that initializing the relative position between query and key tokens can limit attention distance, allowing for effective attention to local features without using convolutional blocks, similar to convolutional kernels. Based on this insight, this paper proposes a new hybrid multi-scale model called <strong>E</strong>fficient <strong>L</strong>ocal <strong>A</strong>ttention trans<strong>F</strong>ormer (ELAFormer). In this model, we propose a Window-based Positional Self-Attention (WPSA) module that focuses on adjacent tokens for short-distance features when querying the key token. Furthermore, we improve the conventional Spatial Reduction Attention (SRA) module by employing Depth-wise Separable (DS) convolution instead of standard down-sampling convolution(DSSRA) for long-distance contexts. By stacking these two modules, extensive experiments demonstrate that our model, with a small size of only 28M, achieves 82.9% accuracy on ImageNet classification with an input size of 224 × 224. Our model outperforms state-of-the-art Transformer models. The small ELAFormer model surpasses the tiny focal transformer by +1.3% mAP with RetinaNet 1x on COCO and +1.8/+2.0% mIoU/MS mIouU with UperNet on ADE20k, serving as a strong backbone for the most challenging computer vision tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"325 ","pages":"Article 113851"},"PeriodicalIF":7.2,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Structure-Feature Dynamic Decoupling GNN architecture for link prediction 一种用于链路预测的结构-特征动态解耦GNN结构
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-13 DOI: 10.1016/j.knosys.2025.113883
Guowang Li, Zhiqiang Pan, Fei Cai, Weijie Chen, Langgao Cheng
{"title":"A Structure-Feature Dynamic Decoupling GNN architecture for link prediction","authors":"Guowang Li,&nbsp;Zhiqiang Pan,&nbsp;Fei Cai,&nbsp;Weijie Chen,&nbsp;Langgao Cheng","doi":"10.1016/j.knosys.2025.113883","DOIUrl":"10.1016/j.knosys.2025.113883","url":null,"abstract":"<div><div>Link prediction aims to forecast the missing links within a graph, which is widely applied in various fields such as recommender systems and drug analysis. Graph Neural Networks (GNNs) have emerged as strong baselines for link prediction due to their ability to simultaneously capture the topological structure and node features of graphs. Moreover, existing approaches use the node features as the initial embedding of nodes and input them into GNNs for message passing and updating. However, these methods assume that the features and topology in graphs are homophonic and do not take into account the possible incompatibility that is common and even very severe in some graphs, harming the performance of GNNs in link prediction.</div><div>To address this issue, we propose a Structure-Feature Dynamic Decoupling GNN architecture (SFDDGNN), which mainly consists of two decoupled embedding pipelines and the dynamic gate fusion mechanism. Specifically, to avoid the incompatibility, we first utilize a GraphSAGE-based structure encoder to capture the topological structure in one pipeline. Then we construct a graph contrastive learning module to train the node feature embedding in the other pipeline. Finally, we dynamically aggregate the topology and features embedding based on the graph data distribution knowledge. Experimental results on three real-world datasets of link prediction demonstrate that SFDDGNN outperforms the state-of-the-art baselines by up to 3.54% and 6.55% in terms of AP and AUC, respectively.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113883"},"PeriodicalIF":7.2,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144306217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A stochastic configuration network based on an improved sampling strategy and Bayesian optimization 基于改进采样策略和贝叶斯优化的随机配置网络
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-12 DOI: 10.1016/j.knosys.2025.113879
Zihuan Xu, Xia Zhang
{"title":"A stochastic configuration network based on an improved sampling strategy and Bayesian optimization","authors":"Zihuan Xu,&nbsp;Xia Zhang","doi":"10.1016/j.knosys.2025.113879","DOIUrl":"10.1016/j.knosys.2025.113879","url":null,"abstract":"<div><div>Stochastic configuration network (SCN) is an incremental random learning model that achieves fast convergence through a supervised mechanism, making it well-suited for adapting to variations in data characteristics across different regression and classification tasks. However, due to the inherent limitations of fully connected neural networks, the original random sampling strategy may lead to a decline in SCN’s generalization performance. Bayesian optimization, a global optimization method based on Bayesian statistics and Gaussian processes, can efficiently and accurately identify the optimal hyperparameters for model performance. Based on this, this paper proposes a Bayesian optimization-based stochastic configuration network (BO-SCN) algorithm, which integrates an improved sampling strategy and Bayesian optimization. First, a scaling factor <span><math><mi>s</mi></math></span> is introduced as a new hyperparameter into both uniform and normal distribution sampling strategies to ensure that the sampled weight values remain relatively small. Second, Bayesian optimization is employed to automatically select the optimal value of <span><math><mi>s</mi></math></span>, and an optimal search range for <span><math><mi>s</mi></math></span> is proposed, minimizing manual intervention while maximizing model performance. Finally, the performance of BO-SCN is evaluated on ten benchmark datasets. Experimental results demonstrate that the proposed algorithm not only enhances prediction accuracy and maintains stability but also significantly reduces the complexity of hyperparameter tuning.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"325 ","pages":"Article 113879"},"PeriodicalIF":7.2,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144312929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MDPM: Modulating domain-specific prompt memory for multi-domain traffic flow prediction with transformers 用变压器调制特定于域的提示存储器进行多域交通流量预测
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-06-12 DOI: 10.1016/j.knosys.2025.113881
Zhuang Zhuang , Lingbo Liu , Kan Guo , Xingtong Yu , Heng Qi , Yanming Shen , Baocai Yin
{"title":"MDPM: Modulating domain-specific prompt memory for multi-domain traffic flow prediction with transformers","authors":"Zhuang Zhuang ,&nbsp;Lingbo Liu ,&nbsp;Kan Guo ,&nbsp;Xingtong Yu ,&nbsp;Heng Qi ,&nbsp;Yanming Shen ,&nbsp;Baocai Yin","doi":"10.1016/j.knosys.2025.113881","DOIUrl":"10.1016/j.knosys.2025.113881","url":null,"abstract":"<div><div>Multi-domain traffic flow prediction aims to develop a versatile model that uses historical traffic data from various sources to forecast future traffic conditions across these individual datasets. Existing deep traffic prediction models typically focus on mining spatial–temporal relationships in a single dataset. However, there are two limitations should be considered: <strong>Lack of Model Universality</strong>, current traffic prediction research remains constrained by the absence of a universal model adaptable to multiple datasets, restricting performance improvement across diverse scenarios; <strong>Underutilized Cross-Dataset Similarities</strong>, while existing datasets exhibit both exclusive and shared spatial–temporal patterns, effectively leveraging these common patterns to enhance model performance continues to present technical challenges. To overcome the limitations mentioned above, this study introduces a straightforward yet efficient <u>M</u>odulating <u>D</u>omain-Specific <u>P</u>rompt <u>M</u>emory (MDPM) to model complex spatial–temporal interaction and better leverage similar spatial–temporal patterns across diverse datasets. Specifically, our approach is tailored with three key innovations: (1) A domain-shared encoder incorporating intra-modality Spatial–Temporal Rotary Position Encoder (ST2R) to capture universal patterns; (2) A gate fusion mechanism enhanced by contrastive learning with inter-modality ST2R to optimize spatial–temporal feature alignment; (3) Domain-specific learnable prompt vectors that dynamically guide each transformer layer in capturing unique urban traffic characteristics at node-level temporal granularity. <strong>Notably, this architecture achieves state-of-the-art performance without requiring supplementary road network data.</strong> Comprehensive experiments conducted on six real-world public traffic datasets show that our proposed method significantly surpasses existing state-of-the-art approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"325 ","pages":"Article 113881"},"PeriodicalIF":7.2,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信