Knowledge-Based Systems最新文献

筛选
英文 中文
An enhanced CLKAN-RF framework for robust anomaly detection in unmanned aerial vehicle sensor data 一种用于无人机传感器数据鲁棒异常检测的增强型CLKAN-RF框架
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-02 DOI: 10.1016/j.knosys.2025.113690
Chuanjiang Li , Wenhui Xie , Bing Zheng , Qian Yi , Lei Yang , Bingtao Hu , Chengxin Deng
{"title":"An enhanced CLKAN-RF framework for robust anomaly detection in unmanned aerial vehicle sensor data","authors":"Chuanjiang Li ,&nbsp;Wenhui Xie ,&nbsp;Bing Zheng ,&nbsp;Qian Yi ,&nbsp;Lei Yang ,&nbsp;Bingtao Hu ,&nbsp;Chengxin Deng","doi":"10.1016/j.knosys.2025.113690","DOIUrl":"10.1016/j.knosys.2025.113690","url":null,"abstract":"<div><div>Autonomous flight and real-time control of unmanned aerial vehicles (UAVs) critically rely on onboard sensors, which are susceptible to mechanical and environmental disruptions. Sensor anomalies pose substantial risks to UAV safety, emphasizing the importance of anomaly detection (AD) methods. However, AD remains challenging due to the scarcity of real anomaly data and the intricate spatiotemporal dependencies in sensor readings, often obscured by random noise and interference. This paper presents an enhanced framework based on one-dimensional convolutional neural network (1D CNN), long short-term memory network (LSTM) and Kolmogorov-Arnold network (KAN) with residual filtering (CLKAN-RF), utilizing multivariate sensor data without labeled information. First, a correlation analysis is employed to avoid the negative impact of irrelevant parameters on model training. Second, a multiple regression model is designed to comprehensively extract spatial-temporal relationships using 1D CNN and LSTM, while KAN is incorporated to non-linearly process the complex patterns and optimize the learned features with high accuracy. To address the issue of random noise, a bi-directional adaptive exponentially weighted moving average (Bi-AEWMA) scheme is introduced to smooth residuals, complemented by an adaptive dynamic thresholding mechanism to further enhance detection performance. Finally, extensive experiments on real UAV sensor data highlight the superiority of the proposed CLKAN-RF framework, which improves the true positive rate and overall accuracy by an average of 6.43 % and 7.63 %, respectively, while reducing the false positive rate by an average of 11.96 % compared to existing methods, demonstrating its potential application in UAV prognostics and health management.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113690"},"PeriodicalIF":7.2,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143906446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RCLMuFN: Relational context learning and multiplex fusion network for multimodal sarcasm detection 基于关系上下文学习和多重融合网络的多模态讽刺检测
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-02 DOI: 10.1016/j.knosys.2025.113614
Tongguan Wang , Junkai Li , Guixin Su , Yongcheng Zhang , Dongyu Su , Yuxue Hu , Ying Sha
{"title":"RCLMuFN: Relational context learning and multiplex fusion network for multimodal sarcasm detection","authors":"Tongguan Wang ,&nbsp;Junkai Li ,&nbsp;Guixin Su ,&nbsp;Yongcheng Zhang ,&nbsp;Dongyu Su ,&nbsp;Yuxue Hu ,&nbsp;Ying Sha","doi":"10.1016/j.knosys.2025.113614","DOIUrl":"10.1016/j.knosys.2025.113614","url":null,"abstract":"<div><div>Sarcasm typically conveys emotions of contempt or criticism by expressing a meaning that is contrary to the speaker’s true intent. Accurately detecting sarcasm aids in identifying and filtering undesirable information on the Internet, thereby mitigating malicious defamation and rumor-mongering. Nonetheless, automatic sarcasm detection remains a challenging task for machines, as it critically depends on intricate factors such as relational context. Existing multimodal sarcasm detection methods focus on introducing graph structures to establish entity relationships between text and image while neglecting to learn the relational context between text and image, which is crucial evidence for understanding the meaning of sarcasm. In addition, the meaning of sarcasm evolves across different contexts, but current methods may struggle to accurately model such dynamic changes, thereby limiting the generalization ability of the models. To address the aforementioned issues, we propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection. First, we employ four feature extractors to comprehensively extract features from raw text and images, aiming to excavate potential features that may have been previously overlooked. Second, we propose a relational context learning module to learn the contextual information of text and images and capture the dynamic properties through shallow and deep interactions. Finally, we propose a multiplex feature fusion module to enhance the model’s generalization by effectively integrating multimodal features derived from diverse interaction contexts. Extensive experiments on two multimodal sarcasm detection datasets show that RCLMuFN achieves state-of-the-art performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113614"},"PeriodicalIF":7.2,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CL-HOI: Cross-level human–object interaction distillation from multimodal large language models CL-HOI:从多模态大型语言模型中提取跨层人机交互
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-02 DOI: 10.1016/j.knosys.2025.113561
Jianjun Gao , Chen Cai , Ruoyu Wang , Wenyang Liu , Kim-Hui Yap , Kratika Garg , Boon Siew Han
{"title":"CL-HOI: Cross-level human–object interaction distillation from multimodal large language models","authors":"Jianjun Gao ,&nbsp;Chen Cai ,&nbsp;Ruoyu Wang ,&nbsp;Wenyang Liu ,&nbsp;Kim-Hui Yap ,&nbsp;Kratika Garg ,&nbsp;Boon Siew Han","doi":"10.1016/j.knosys.2025.113561","DOIUrl":"10.1016/j.knosys.2025.113561","url":null,"abstract":"<div><div>Human–object interaction (HOI) detection often relies on labor-intensive annotations, but multimodal large language models (MLLMs) show potential for recognizing and reasoning about image-level interactions. However, MLLMs are typically computationally heavy and lack instance-level HOI detection capabilities. In this paper, we propose a cross-level HOI distillation (CL-HOI) framework that distills instance-level HOI detection from MLLMs, expanding HOI detection without labor-intensive and expensive manual annotations. Our approach uses CL-HOI as a student model to distill HOIs from a teacher MLLM in two stages: context distillation, where a visual-linguistic translator (VLT) converts visual information into linguistic form, and interaction distillation, where an interaction cognition network (ICN) facilitates interaction reasoning. Contrastive distillation losses transfer image-level context and interactions to the VLT and ICN for instance-level HOI detection. Evaluations on the HICO-DET and V-COCO datasets show that our method outperforms existing weakly supervised approaches, demonstrating its effectiveness in HOI detection without manual annotations.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"320 ","pages":"Article 113561"},"PeriodicalIF":7.2,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language model based system with causal inference and Chain-of-Thoughts reasoning for traffic scene risk assessment 基于因果推理和思维链推理的大型语言模型交通场景风险评估系统
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-02 DOI: 10.1016/j.knosys.2025.113630
Wuchang Zhong , Jinglin Huang , Maoqiang Wu , Weinan Luo , Rong Yu
{"title":"Large language model based system with causal inference and Chain-of-Thoughts reasoning for traffic scene risk assessment","authors":"Wuchang Zhong ,&nbsp;Jinglin Huang ,&nbsp;Maoqiang Wu ,&nbsp;Weinan Luo ,&nbsp;Rong Yu","doi":"10.1016/j.knosys.2025.113630","DOIUrl":"10.1016/j.knosys.2025.113630","url":null,"abstract":"<div><div>Evaluating potential traffic scene risks is crucial for the decision-making process in autonomous driving systems. Large Language Models (LLMs), due to their advanced scene understanding and reasoning capabilities, offer a new paradigm for decision-making in autonomous driving. However, when used to evaluate traffic scene risks, LLM faces challenges of model hallucination and slow inference speed. Therefore, we have developed a TrafficRiskGPT-based system. Initially, we designed the TrafficRiskGPT model, which is a large language model specifically used for traffic scene risk reasoning. Its core is based on the LLaMA3-8B model, and it incorporates a large amount of traffic risk datasets for LoRA (Low-Rank Adaptation) fine-tuning, enabling the model to deeply understand traffic scene risks. On this basis, we designed a system around the TrafficRiskGPT model. The system establishes a knowledge base based on traffic scene risks and incorporated HNSW (Hierarchical Navigable Small World graphs) method to improve the retrieval efficiency of the knowledge base, introduces vLLM technology to improve the system’s inference speed, and constructs a comprehensive risk evaluation metric to assess the system’s performance on traffic scene risks. Finally, we designed a CI-CoT (Causal Inference Chain-of-Thought) technique, allowing the system to gradually evaluate the traffic risks associated with each decision, thereby reducing model hallucinations and slow inference speed issues. Our experiments show that in the same scenarios, in terms of vehicle collision rates, our method reduces the rate by 7.3% compared to GPT4o-mini, significantly reduces outputs irrelevant to traffic scene risks, and improves model inference speed by a factor of 3.62.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113630"},"PeriodicalIF":7.2,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AUSAM: Adaptive Unified Segmentation Anything Model for multi-modality tumor segmentation and enhanced detection in medical imaging AUSAM:用于医学影像中多模态肿瘤分割和增强检测的自适应统一分割模型
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-01 DOI: 10.1016/j.knosys.2025.113588
Suraj Sood , Saeed Alqarni , Syed Jawad Hussain Shah, Yugyung Lee
{"title":"AUSAM: Adaptive Unified Segmentation Anything Model for multi-modality tumor segmentation and enhanced detection in medical imaging","authors":"Suraj Sood ,&nbsp;Saeed Alqarni ,&nbsp;Syed Jawad Hussain Shah,&nbsp;Yugyung Lee","doi":"10.1016/j.knosys.2025.113588","DOIUrl":"10.1016/j.knosys.2025.113588","url":null,"abstract":"<div><div>Tumor segmentation in medical imaging is critical for diagnosis, treatment planning, and prognosis, yet remains challenging due to limited annotated data, tumor heterogeneity, and modality-specific complexities in CT, MRI, and histopathology. Although the <em>Segment Anything Model (SAM)</em> shows promise as a zero-shot learner, it struggles with irregular tumor boundaries and domain-specific variations. We introduce the <em>Adaptive Unified Segmentation Anything Model (AUSAM)</em>. This novel framework extends SAM’s capabilities for multi-modal tumor segmentation by integrating an intelligent prompt module, dynamic sampling, and stage-based thresholding. Specifically, clustering-based prompt learning (DBSCAN for CT/MRI and K-means for histopathology) adaptively allocates prompts to capture challenging tumor regions, while entropy-guided sampling and dynamic thresholding systematically reduce annotation requirements and computational overhead. Validated on diverse benchmarks—LiTS (CT), FLARE 2023 (CT/MRI), ORCA, and OCDC (histopathology)—AUSAM achieves state-of-the-art Dice Similarity Coefficients (DSC) of 94.25%, 91.84%, 87.59%, and 91.84%, respectively, with significantly reduced data usage. As the first framework to adapt SAM for multi-modal tumor segmentation, AUSAM sets a new standard for precision, scalability, and efficiency. It is offered in two variants: <em>AUSAM-Lite</em> for resource-constrained environments and <em>AUSAM-Max</em> for maximum segmentation accuracy, thereby advancing medical imaging and clinical decision-making.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113588"},"PeriodicalIF":7.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel contextual fuzzy rule model based on conditional input space partitioning driven by data reconstruction in autoencoder and randomization-based neural networks 基于自编码器数据重构驱动条件输入空间划分的核上下文模糊规则模型和基于随机化的神经网络
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-01 DOI: 10.1016/j.knosys.2025.113679
Congcong Zhang , Sung-Kwun Oh , Zunwei Fu , Witold Pedrycz
{"title":"Kernel contextual fuzzy rule model based on conditional input space partitioning driven by data reconstruction in autoencoder and randomization-based neural networks","authors":"Congcong Zhang ,&nbsp;Sung-Kwun Oh ,&nbsp;Zunwei Fu ,&nbsp;Witold Pedrycz","doi":"10.1016/j.knosys.2025.113679","DOIUrl":"10.1016/j.knosys.2025.113679","url":null,"abstract":"<div><div>Fuzzy Rule-based Models (FRMs) have attracted significant interest in the field of machine learning owing to their modular architecture, robust design methodologies, and sound interpretability. This study introduces a novel Kernel Contextual Fuzzy Rule Model (KCFRM) designed to cope with regression tasks. A Kernel Contextual Fuzzy Clustering (KCFC) algorithm is proposed for conditional partitioning of the input space. Specifically, we incorporate the Mercer kernel into context fuzzy clustering to enhance the abilities of the model to distinguish, extract, and amplify useful features via nonlinear mapping, thereby generating more suitable information granules. Additionally, we employ an autoencoder’s “encoding-decoding” mechanism to extract differences between data patterns and subsequently transform these into KCFC contexts via a conversion function, therefore leading to the creation of high-quality fuzzy sets. In the conclusion portion of fuzzy rules, conventional numerical or linear functions struggle to adequately describe the complex behavior present in local fuzzy regions. To mitigate this, we incorporate a Randomization-based Neural Network (RANN), capable of providing superior approximation capabilities and substantial computational efficiency. RANN overcomes traditional method constraints in representing complex behaviors within the fuzzy region. The inclusion of RANN results in more accurate and efficient conclusions for fuzzy rules. The uniqueness of this study lies in its holistic approach to designing fuzzy models with KCFC and RANN, improving the expressiveness of the rules and enhancing the generalization of the models. KCFRM’s performance is assessed using various publicly available machine learning datasets, with experimental results underscoring its effectiveness and performance enhancements.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"320 ","pages":"Article 113679"},"PeriodicalIF":7.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sequential mixing fusion network for enhanced feature representations in multimodal sentiment analysis 多模态情感分析中增强特征表示的顺序混合融合网络
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-01 DOI: 10.1016/j.knosys.2025.113638
Chenchen Wang , Qiang Zhang , Jing Dong , Hui Fang , Gerald Schaefer , Rui Liu , Pengfei Yi
{"title":"A sequential mixing fusion network for enhanced feature representations in multimodal sentiment analysis","authors":"Chenchen Wang ,&nbsp;Qiang Zhang ,&nbsp;Jing Dong ,&nbsp;Hui Fang ,&nbsp;Gerald Schaefer ,&nbsp;Rui Liu ,&nbsp;Pengfei Yi","doi":"10.1016/j.knosys.2025.113638","DOIUrl":"10.1016/j.knosys.2025.113638","url":null,"abstract":"<div><div>Multimodal sentiment analysis exploits multiple modalities to understand a user’s sentiment state from video content. Recent work in this area integrates features derived from different modalities. However, current multimodal sentiment datasets are typically small with limited cross-modal interaction diversity, for which simple feature fusion mechanisms can lead to modality dependence and model overfitting. Consequently, how to augment diverse cross-modal samples and use non-verbal modalities to dynamically enhance text feature representations is still under-explored. In this paper, we propose a sequential mixing fusion network to tackle this research challenge. Using speech text content as a primary source, we design a sequential fusion strategy to maximise the feature expressiveness enhanced by auxiliary modalities, namely facial movements and audio features, and a random feature-level mixing algorithm to augment diverse cross-modality interactions. Experimental results on three benchmark datasets show that our proposed approach significantly outperforms current state-of-the-art methods, while demonstrating strong robustness capability when dealing with a missing modality.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"320 ","pages":"Article 113638"},"PeriodicalIF":7.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing personalized trip recommendations with attractive route analysis and graph attention auto-encoder 增强个性化的旅行建议有吸引力的路线分析和图形注意自动编码器
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-05-01 DOI: 10.1016/j.knosys.2025.113639
Jiqing Gu , Chao Song , Wenjun Jiang , Li Lu , Ming Liu
{"title":"Enhancing personalized trip recommendations with attractive route analysis and graph attention auto-encoder","authors":"Jiqing Gu ,&nbsp;Chao Song ,&nbsp;Wenjun Jiang ,&nbsp;Li Lu ,&nbsp;Ming Liu","doi":"10.1016/j.knosys.2025.113639","DOIUrl":"10.1016/j.knosys.2025.113639","url":null,"abstract":"<div><div>Personalized trip recommendations aim to offer an itinerary featuring various points of interest (POIs) to the user. Many previous works search POIs only according to their popularity. However, the routes between the POIs are attractive to visitors, and some of these routes are very popular. This kind of route, which enhances the user experience, is referred to as AR. In this paper, we investigate attractive routes in order to enhance personalized trip recommendation. We introduce TRAR, a personalized underlineTrip <u>R</u>ecommender with POIs and <u>A</u>ttractive <u>R</u>outes, which is comprised of three components: AR discovery, AR evaluation, and trip recommendation. We propose two methods for AR discovery: one focuses on discovering AR by analyzing the Gini coefficient and the popularity of POIs, the other is to discover AR with the help of graph attention auto-encoder (GATE). In order to discover more attractive routes for users to improve their user experience, we take the structure information of a travel graph into consideration to extract the features of routes; then we introduce GATE to AR discovery. In the AR evaluation, we estimate attractive routes’ rating scores and preferences by applying the gravity model in a category space. To enhance user experience, TRAR balances the trade-off between user experience and time cost by recommending trips that include attractive routes. The experimental results indicate that the proposed TRAR is superior to other state-of-the-art algorithms.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113639"},"PeriodicalIF":7.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Video Action Recognition Based on Deep Learning 基于深度学习的视频动作识别研究综述
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-30 DOI: 10.1016/j.knosys.2025.113594
Ping Gong, Xudong Luo
{"title":"A Survey of Video Action Recognition Based on Deep Learning","authors":"Ping Gong,&nbsp;Xudong Luo","doi":"10.1016/j.knosys.2025.113594","DOIUrl":"10.1016/j.knosys.2025.113594","url":null,"abstract":"<div><div>Video Action Recognition (VAR) involves identifying and classifying human actions from video data. Deep Learning (DL) has revolutionised VAR, significantly enhancing its accuracy and efficiency. However, large-scale practical applications of VAR using DL remain limited, underscoring the need for further research and innovation. Thus, this survey provides a comprehensive overview of recent advancements in DL-based VAR. Specifically, we summarise the key DL architectures for VAR, including two-stream networks, 3D-CNNs, RNNs, LSTMs, and Attention Mechanisms, and analyse their strengths, limitations, and benchmark performances. The survey also explores the diverse applications of DL-based VAR, such as surveillance, human–computer interaction, sports analytics, healthcare, and education, while presenting a detailed summary of commonly used datasets and evaluation metrics. Moreover, critical challenges, such as computational demands and the need for robust temporal modelling, are identified, along with potential future directions. This paper is a valuable resource for researchers and practitioners striving to advance VAR using DL techniques by systematically presenting concepts, methodologies, and trends.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"320 ","pages":"Article 113594"},"PeriodicalIF":7.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radial Adaptive Node Embedding Hashing for cross-modal retrieval 径向自适应节点嵌入哈希法跨模态检索
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-30 DOI: 10.1016/j.knosys.2025.113522
Yunfei Chen , Renwei Xia , Zhan Yang , Jun Long
{"title":"Radial Adaptive Node Embedding Hashing for cross-modal retrieval","authors":"Yunfei Chen ,&nbsp;Renwei Xia ,&nbsp;Zhan Yang ,&nbsp;Jun Long","doi":"10.1016/j.knosys.2025.113522","DOIUrl":"10.1016/j.knosys.2025.113522","url":null,"abstract":"<div><div>With the rapid growth of multimedia data on social networks, efficient and accurate cross-modal retrieval has become essential. Cross-modal hashing methods offer advantages such as fast retrieval speed and low storage cost. However, unsupervised deep cross-modal hashing methods often struggle with semantic misalignment and noise, limiting their effectiveness in capturing fine-grained relationships across modalities. To address these challenges, we propose Radial Adaptive Node Embedding Hashing (RANEH), designed to enhance semantic consistency and retrieval efficiency. Specifically, the semantic meta-similarity construction module reconstructs identity semantics using a similarity matrix, ensuring that hash codes retain modality-specific features. The radial adaptive hybrid coding method employs FastKAN as an encoder to map features into a shared hash space, maintaining semantic consistency across modalities. Lastly, the broadcasting node embedding unit leverages the Fast Kolmogorov–Arnold network to capture deep modality relationships, improving semantic alignment and node embedding accuracy. Experiments on the NUS-WIDE, MIRFlickr, and MSCOCO datasets show that RANEH method consistently outperforms state-of-the-art unsupervised cross-modal hashing methods in accuracy and efficiency. The codes are available at <span><span>https://github.com/YunfeiChenMY/RANEH</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113522"},"PeriodicalIF":7.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信