Knowledge-Based Systems最新文献

筛选
英文 中文
Structure adversarial augmented graph anomaly detection via multi-view contrastive learning 基于多视图对比学习的结构对抗增强图异常检测
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-01 DOI: 10.1016/j.knosys.2026.115455
Qian Chen , Huiying Xu , Ruidong Wang , Yue Liu , Xinzhong Zhu
{"title":"Structure adversarial augmented graph anomaly detection via multi-view contrastive learning","authors":"Qian Chen ,&nbsp;Huiying Xu ,&nbsp;Ruidong Wang ,&nbsp;Yue Liu ,&nbsp;Xinzhong Zhu","doi":"10.1016/j.knosys.2026.115455","DOIUrl":"10.1016/j.knosys.2026.115455","url":null,"abstract":"<div><div>Graph anomaly detection is essential for many security-related fields but faces significant challenges in handling complex real-world graph data. Due to the complex and imbalanced graph structure, it is difficult to find abnormal points among many nodes. Current contrastive learning methods often overlook structural imperfections in real-world graphs, such as redundant edges and low-degree sparse nodes. Redundant connections may introduce noise during message passing, while sparse nodes receive insufficient structural information to accurately learn representation, which can degrade detection performance. To overcome above challenges, we propose SAA-GCL, an innovative framework that integrates adaptive structure adversarial augmentation with multi-view contrastive learning. Specifically, by edge weight learning and LMSE loss calculation, our approach adaptively optimizes the structure of the augmented graph, discards redundant edges as much as possible, and retains more discriminating features. For low-degree sparse nodes, we mix their self-networks with the self-networks of auxiliary nodes to improve the representation quality. In order to fully mine abnormal information, we use the multi-view contrastive loss function to distinguish positive and negative sample pairs within the view and maintain cross-view consistency. The framework adaptively refines the graph topology to suppress noisy edges and enhance representations for structurally weak nodes, so it can improve anomaly detection performance in the imbalanced structure attribute graph. Comprehensive experiments on six real-world graph datasets show that SAA-GCL is superior to existing methods in detection accuracy. Our code is open source at <span><span>https://github.com/HZAI-ZJNU/SAAGCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115455"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scene-aware memory discrimination: Deciding which personal knowledge stays 情景感知记忆辨别:决定哪些个人知识保留
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-09 DOI: 10.1016/j.knosys.2026.115496
Yijie Zhong , Mengying Guo , Zewei Wang , Zhongyang Li , Dandan Tu , Haofen Wang
{"title":"Scene-aware memory discrimination: Deciding which personal knowledge stays","authors":"Yijie Zhong ,&nbsp;Mengying Guo ,&nbsp;Zewei Wang ,&nbsp;Zhongyang Li ,&nbsp;Dandan Tu ,&nbsp;Haofen Wang","doi":"10.1016/j.knosys.2026.115496","DOIUrl":"10.1016/j.knosys.2026.115496","url":null,"abstract":"<div><div>Intelligent devices have become deeply integrated into everyday life, generating vast amounts of user interactions that form valuable personal knowledge. Efficient organization of this knowledge in user memory is essential for enabling personalized applications. However, current research on memory writing, management, and reading using large language models (LLMs) faces challenges in filtering irrelevant information and in dealing with rising computational costs. Inspired by the concept of selective attention in the human brain, we introduce a memory discrimination task. To address large-scale interactions and diverse memory standards in this task, we propose a Scene-Aware Memory Discrimination method (SAMD), which comprises two key components: the Gating Unit Module (GUM) and the Cluster Prompting Module (CPM). GUM enhances processing efficiency by filtering out non-memorable interactions and focusing on the salient content most relevant to application demands. CPM establishes adaptive memory standards, guiding LLMs to discern what information should be remembered or discarded. It also analyzes the relationship between user intents and memory contexts to build effective clustering prompts. Comprehensive direct and indirect evaluations demonstrate the effectiveness and generalization of our approach. We independently assess the performance of memory discrimination, showing that SAMD successfully recalls the majority of memorable data and remains robust in dynamic scenarios. Furthermore, when integrated into personalized applications, SAMD significantly enhances both the efficiency and quality of memory construction, leading to better organization of personal knowledge.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115496"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive multi-modal retrieval for structured semantic trees in engineering documents 工程文档结构化语义树的递归多模态检索
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-01-29 DOI: 10.1016/j.knosys.2026.115433
Fei Li, Xinyu Li, Jinsong Bao
{"title":"Recursive multi-modal retrieval for structured semantic trees in engineering documents","authors":"Fei Li,&nbsp;Xinyu Li,&nbsp;Jinsong Bao","doi":"10.1016/j.knosys.2026.115433","DOIUrl":"10.1016/j.knosys.2026.115433","url":null,"abstract":"<div><div>In lifecycle-oriented manufacturing systems, numerous engineering documents with text, tables, and images are continuously produced. Retrieval-augmented generation (RAG) models enhance document retrieval efficiency and adapt to evolving domain knowledge. However, existing methods struggle to achieve accurate cross-modal semantic alignment and high-precision retrieval in engineering documents. To address these limitations, this paper proposes the recursive multi-modal retrieval for structured semantic trees (RMR-SST) method for engineering documents. First, layout analysis extracts multimodal elements and divides metadata into three hierarchical levels: minimal chunks, assembly chunks, and section chunks. Domain rules are then applied to compute inter-section semantic relationships and construct the structured semantic trees (SSTs) of engineering documents. Second, a context-aware multimodal semantic alignment strategy is proposed to embed multimodal metadata chunks and their semantic relationships into a unified vector space, enabling cross-modal semantic alignment of SSTs. Finally, a recursive abstractive multimodal metadata retrieval algorithm is designed to integrate multimodal information across documents at different abstraction levels and to generate multimodal retrieval results. Based on 872 ship-design engineering documents, multiple SSTs were constructed for evaluation. Experiments show that RMR-SST outperforms conventional RAG methods in multimodal retrieval and semantic alignment tasks, achieving a Hit@5 of 88.3% when integrated with the Qwen3–235B model.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115433"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transferable multi-level spatial-temporal graph neural network for adaptive multi-agent trajectory prediction 自适应多智能体轨迹预测的可转移多层次时空图神经网络
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-01-31 DOI: 10.1016/j.knosys.2026.115451
Yu Sun , Dengyu Xiao , Mengdie Huang , Jiali Wang , Chuan Tong , Jun Luo , Huayan Pu
{"title":"Transferable multi-level spatial-temporal graph neural network for adaptive multi-agent trajectory prediction","authors":"Yu Sun ,&nbsp;Dengyu Xiao ,&nbsp;Mengdie Huang ,&nbsp;Jiali Wang ,&nbsp;Chuan Tong ,&nbsp;Jun Luo ,&nbsp;Huayan Pu","doi":"10.1016/j.knosys.2026.115451","DOIUrl":"10.1016/j.knosys.2026.115451","url":null,"abstract":"<div><div>Accurately predicting future multi-agent trajectories at intersections is crucial yet challenging due to complex and dynamic traffic environments. Existing methods struggle with cross-domain trajectory prediction owing to: 1) there are significant differences in spatiotemporal features between domains, which leads to insufficient modeling of trajectory temporal sequence dynamics during cross-domain spatiotemporal alignment; and 2) the strong heterogeneity of behavioral patterns within different datasets causes significant domain shifts, resulting in a notable performance decline when the model is transferred across datasets. To address the aforementioned challenges, this paper proposes a transferable multi-level spatial-temporal graph neural network (T-MLSTG). Based on maximum mean discrepancy theory, we design a windowed mean gradient discrepancy (<em>WMGD</em>) metric that incorporates mean and gradient information of temporal features to better capture cross-domain distribution differences. Furthermore, a multi-level spatial-temporal graph network (MLSTG) is designed with a two-level architecture. The first level encodes historical spatiotemporal features independently, while the second level integrates spatiotemporal features and employs a channel attention mechanism to enhance feature discrimination. The performance of T-MLSTG was evaluated on the inD and INTERACTION datasets. Compared to the baseline model, the cross-domain trajectory prediction results demonstrate a reduction in root mean square error (RMSE) of 0.812. In cross-dataset trajectory prediction evaluation, the mean error was reduced by 27.8%, demonstrating the method’s effectiveness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115451"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameterized image restoration with diffusion and gradient priors 基于扩散和梯度先验的参数化图像恢复
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-05 DOI: 10.1016/j.knosys.2026.115488
Yang Yang, Xi Zhang, Jiaqi Zhang, Lanling Zeng
{"title":"Parameterized image restoration with diffusion and gradient priors","authors":"Yang Yang,&nbsp;Xi Zhang,&nbsp;Jiaqi Zhang,&nbsp;Lanling Zeng","doi":"10.1016/j.knosys.2026.115488","DOIUrl":"10.1016/j.knosys.2026.115488","url":null,"abstract":"<div><div>The diffusion models have demonstrated remarkable performance on the task of image restoration. Most of the existing image restoration methods leverage the diffusion model as a powerful prior. In this paper, we propose a novel method named PIRP that further integrates the gradient prior, which has been a popular prior in image restoration. The integration harnesses the strengths of both priors, thus being able to enhance the overall efficacy of image restoration. More importantly, the incorporation of the gradient prior improves the flexibility of the method by facilitating parameterized image restoration, i.e., it provides an effective way to tweak the parameters, which is essential in tailoring satisfactory results. Moreover, we propose a novel plug-and-play sampling method based on the proposed model, which is able to improve the image restoration quality without necessitating any retraining. To validate the effectiveness of the proposed method, we have conducted extensive experiments on multiple image restoration tasks, including single-image super-resolution, Gaussian deblurring, motion deblurring, and their noisy variants. Both qualitative and quantitative results on popular datasets demonstrate the advantages of the proposed method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115488"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMs For drug-Drug interaction prediction using textual drug descriptors 使用文本药物描述符进行药物-药物相互作用预测
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-04 DOI: 10.1016/j.knosys.2026.115486
Gabriele De Vito , Filomena Ferrucci , Athanasios Angelakis
{"title":"LLMs For drug-Drug interaction prediction using textual drug descriptors","authors":"Gabriele De Vito ,&nbsp;Filomena Ferrucci ,&nbsp;Athanasios Angelakis","doi":"10.1016/j.knosys.2026.115486","DOIUrl":"10.1016/j.knosys.2026.115486","url":null,"abstract":"<div><div>As treatment plans involve more medications, anticipating and preventing drug-drug interactions (DDIs) becomes increasingly important. Such interactions can result in harmful side effects and may reduce therapy effectiveness. Currently, most computational approaches for DDI prediction rely heavily on complex feature engineering and require chemical information to be structured in specific formats to enable accurate detection of potential interactions. This study presents the first investigation of the application of Large Language Models (LLMs) for DDI prediction using drug characteristics expressed solely in free-text form. Specifically, we use SMILES notations, target organisms, and gene associations as inputs in purpose-designed prompts, allowing LLMs to learn the underlying relationships among these descriptors and accordingly predict possible DDIs. We evaluated the performance of 18 distinct LLMs under zero-shot, few-shot, and fine-tuning settings on the DrugBank dataset (version 5.1.12) to identify the most effective paradigm. We then assessed the generalizability of the fine-tuned models on 13 external DDI datasets against well-known machine learning baselines. The results demonstrated that, while zero-shot and few-shot paradigms showed only modest utility, fine-tuned models achieved superior sensitivity while maintaining competitive accuracy and F1-score compared to baselines. Notably, despite its small size, the Phi-3.5 2.7B model attained a sensitivity of 0.978 and an accuracy of 0.919. These findings suggest that computational efficiency and task-specific adaptation are more important than model size in order to capture the complex patterns inherent in drug interactions, and outline a more accessible paradigm for DDI prediction that can be integrated into clinical decision support systems.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115486"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating deep clustering and multi-view graph neural networks for recommender system 集成深度聚类和多视图神经网络的推荐系统
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-07 DOI: 10.1016/j.knosys.2026.115449
Jiaxuan Song , Yue Li , Duantengchuan Li , Xiaoguang Wang , Rui Zhang , Hui Zhang , Jinsong Chen
{"title":"Integrating deep clustering and multi-view graph neural networks for recommender system","authors":"Jiaxuan Song ,&nbsp;Yue Li ,&nbsp;Duantengchuan Li ,&nbsp;Xiaoguang Wang ,&nbsp;Rui Zhang ,&nbsp;Hui Zhang ,&nbsp;Jinsong Chen","doi":"10.1016/j.knosys.2026.115449","DOIUrl":"10.1016/j.knosys.2026.115449","url":null,"abstract":"<div><div>The existing graph neural network recommendation models aggregate neighborhood information by using a weighted sum strategy based on node popularity. However, this strategy struggles to accurately model the impact of item category features on user behavior. To alleviate this problem, we propose MDCRec, a novel graph convolutional recommendation framework integrating deep clustering. MDCRec utilizes the deep clustering module to mine item category information from the item review keyword documents and constructs multi-view subgraphs based on category information. Information aggregation based on node popularity is performed subsequently on each subgraph to obtain the node embeddings within each subgraph. Ultimately, based on the interaction distribution of users in each subgraph, the embeddings within multi-view subgraphs are aggregated into the final embeddings of nodes. MDCRec integrates item category information and user interests across categories into information aggregation, allowing recommendation models to capture more fine-grained relationships between items and user preferences. It can also work in tandem with other performance-enhancing techniques like contrastive learning to further boost model effectiveness. Experimental results on public real-world datasets indicate that most graph neural network recommendation models—including variants that use contrastive learning—integrated with the MDCRec information aggregation framework outperform the original popularity-based version. These models achieve varying degrees of performance gains, with average improvements of 1.75% in Recall@20 and 1.87% in NDCG@20. Our code is publicly available at <span><span>https://github.com/dacilab/MDCRec</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115449"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ATARS: Adaptive task-Aware feature learning for Few-Shot Fine-Grained classification ATARS:自适应任务感知特征学习,用于少量细粒度分类
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-04 DOI: 10.1016/j.knosys.2026.115485
Xiaomei Long, Xinyue Wang, Cheng Yang, Zongbo He, Qian He, Xiangdong Chen
{"title":"ATARS: Adaptive task-Aware feature learning for Few-Shot Fine-Grained classification","authors":"Xiaomei Long,&nbsp;Xinyue Wang,&nbsp;Cheng Yang,&nbsp;Zongbo He,&nbsp;Qian He,&nbsp;Xiangdong Chen","doi":"10.1016/j.knosys.2026.115485","DOIUrl":"10.1016/j.knosys.2026.115485","url":null,"abstract":"<div><div>Few-shot fine-grained classification is challenging due to subtle inter-class differences and limited annotations. Existing methods often fail to fully exploit task-level information, limiting adaptation to scarce samples. We present ATARS, a task-aware framework that organizes alignment, feature reconstruction, and task-conditioned channel selection into a coordinated pipeline. These components progressively refine task-adaptive feature representations, enhancing intra-class consistency and discriminative capacity. Extensive experiments on five fine-grained benchmarks demonstrate the effectiveness of this design: ATARS achieves 5-way 5-shot accuracies of 97.38% on Cars, 94.40% on CUB, and 89.78% on Dogs, consistently outperforming previous reconstruction-based and task-aware approaches. The results highlight the benefits of coordinated component design under task-aware guidance in few-shot scenarios. The source code is available at: <span><span>https://github.com/lxm-hjk/ATARS-FSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115485"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BDGKT: Bidirectional dynamic graph knowledge tracing BDGKT:双向动态图知识跟踪
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-10 DOI: 10.1016/j.knosys.2026.115532
Xinjia Ou , Tao Huang , Shengze Hu , Huali Yang , Zhuoran Xu , Junjie Hu , Jing Geng
{"title":"BDGKT: Bidirectional dynamic graph knowledge tracing","authors":"Xinjia Ou ,&nbsp;Tao Huang ,&nbsp;Shengze Hu ,&nbsp;Huali Yang ,&nbsp;Zhuoran Xu ,&nbsp;Junjie Hu ,&nbsp;Jing Geng","doi":"10.1016/j.knosys.2026.115532","DOIUrl":"10.1016/j.knosys.2026.115532","url":null,"abstract":"<div><div>Knowledge tracing (KT) aims to model the evolution of students’ knowledge states by analyzing their historical learning trajectories and predicting future performance. However, current KT methods primarily focus on unidirectional relationship modeling, overlooking the bidirectional dynamic interaction mechanisms between learners and questions. Student knowledge states shape question adaptability through group patterns (e.g., difficulty calibration), whereas dynamic transformation of question features provides progressive guidance signals for knowledge advancement across learning stages. In this study, we propose a novel bidirectional dynamic graph KT (BDGKT) method for modeling the information flow between students and questions while capturing knowledge state evolution and question characteristic transformation. Specifically, we first introduce a dynamic graph construction based on homogeneous student groups that uses a spatiotemporal constraint strategy to reduce computational costs while improving information propagation quality. Subsequently, we design a bidirectional message propagation mechanism to capture time-evolving bidirectional dynamic signals. To update question nodes (from students to questions), we introduce a state-aware attention mechanism that aggregates student nodes and responses, revealing group-level question commonalities. By contrast, to update student nodes (from questions to students), we propose an evolution mechanism that aggregates question nodes and responses based on timestamps, allowing us to track the evolution of student knowledge states. Extensive experiments on four real-world datasets validate the effectiveness and compatibility of our method. Furthermore, BDGKT improves interpretability by exploring question absolute information (group-agnostic) and relative information (group-dependent).</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115532"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emo-STCapsnet: A spatio-temporal modeling approach with enhanced CapsNet for speech emotion recognition Emo-STCapsnet:一种基于增强CapsNet的时空建模方法,用于语音情感识别
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-04-08 Epub Date: 2026-02-03 DOI: 10.1016/j.knosys.2026.115447
Yonghong Fan , Heming Huang , Huiyun Zhang , Ziqi Zhou
{"title":"Emo-STCapsnet: A spatio-temporal modeling approach with enhanced CapsNet for speech emotion recognition","authors":"Yonghong Fan ,&nbsp;Heming Huang ,&nbsp;Huiyun Zhang ,&nbsp;Ziqi Zhou","doi":"10.1016/j.knosys.2026.115447","DOIUrl":"10.1016/j.knosys.2026.115447","url":null,"abstract":"<div><div>Speech emotion recognition (SER) aims to enable computers to accurately identify emotional states embedded in speech signals, a critical area in human-computer interaction. Effective spatio-temporal feature extraction, which captures consistent emotional patterns while minimizing inter-emotion variability, is critical for SER. However, existing approaches often fall short in learning comprehensive spatio-temporal features. To address this, Emo-STCapsNet, a spatio-temporal modeling approach with enhanced capsule network, is proposed. It integrates four components: a temporal dynamic activation block to capture multi-scale temporal variations, a two-stream attentive fusion for past and future context integration to establish global emotional representations, a convolutional block for high-level feature abstraction from the bidirectional temporal representations, and an attention-enhanced CapsNet that leverages vectorized entity representations and dynamic routing mechanisms to more effectively capture hierarchical spatial relationships among emotional features compared to conventional methods like CNNs. Experimental results on the benchmark SER datasets IEMOCAP, EMODB, and CASIA demonstrate the superior performance of Emo-STCapsNet, achieving accuracies of 71.86%, 93.46%, and 87.92%, respectively. Comparative results highlight the superiority of Emo-STCapsNet approach over other methods. Extensive ablation studies further validate the effectiveness of the architecture of Emo-STCapsNet and underscore the necessity of comprehensive spatio-temporal feature learning in SER.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115447"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书