Yiming Wang , Qian Huang , Bin Tang , Xin Li , Xing Li
{"title":"Multiscale motion-aware and spatial–temporal-channel contextual coding network for learned video compression","authors":"Yiming Wang , Qian Huang , Bin Tang , Xin Li , Xing Li","doi":"10.1016/j.knosys.2025.113401","DOIUrl":"10.1016/j.knosys.2025.113401","url":null,"abstract":"<div><div>Video compression performance is significantly dependent on accurate motion prediction and efficient entropy coding. However, most current learned video compression methods rely on pre-trained optical flow networks or simplistic lightweight models for motion estimation, which fail to fully leverage the spatial–temporal characteristics of video sequences. This often brings higher bit consumption and distortion in reconstructed frames. Additionally, these methods frequently overlook the rich contextual information present within feature channels that could enhance entropy modeling. To address these issues, we propose a motion-aware and spatial–temporal-channel contextual coding-based video compression network (MASTC-VC). Specifically, we introduce a multiscale motion-aware module (MS-MAM) that estimates effective motion information across both spatial and temporal dimensions in a coarse-to-fine manner. We also propose a spatial–temporal-channel contextual module (STCCM) which optimizes entropy coding by exploiting latent representation correlations, leading to bit savings from spatial, temporal and channel perspectives. On top of it, we further introduce an uneven channel grouping scheme to strike a balance between computational complexity and rate–distortion (RD) performance. Extensive experiments demonstrate that MASTC-VC outperforms previous learned models across three benchmark datasets. Notably, our method achieves an average 10.15% BD-rate savings compared to H.265/HEVC (HM-16.20) using the PSNR metric and average 23.93% BD-rate savings against H.266/VVC (VTM-13.2) using the MS-SSIM metric.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"316 ","pages":"Article 113401"},"PeriodicalIF":7.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view feature embedding via shared and specific structural contrastive learning","authors":"Yi Li , Ruojin Zhou , Ling Jing , Hongjie Zhang","doi":"10.1016/j.knosys.2025.113395","DOIUrl":"10.1016/j.knosys.2025.113395","url":null,"abstract":"<div><div>Multi-view feature embedding (MvFE) is a powerful technique for addressing the challenges posed by high-dimensional multi-view data. In recent years, contrastive learning (CL) has gained significant attention due to its superior performance. However, existing CL-based methods primarily focus on promoting consistency between any two cross views, thereby overlooking the diversity among views and impeding the simultaneous exploration of both consistency and complementarity. In this study, we propose a novel MvFE method called shared and specific structural contrastive learning (S3CL), which constructs shared and specific losses to capture both shared and specific potential structural information in multi-view data. Additionally, S3CL introduces a novel view-weighting mechanism that adaptively assigns weights to each specific losses, enabling a discriminative treatment of each view based on its uniqueness and importance in the feature embedding process. Moreover, to fully explore the view-specific structures while avoiding the emergence of pseudo-structures, a residual mechanism of incomplete fitting is employed in S3CL. Experimental results on five real-world datasets validate the superior performance of our proposed method compared to existing approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"316 ","pages":"Article 113395"},"PeriodicalIF":7.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Bonisoli , David Vilares , Federica Rollo , Laura Po
{"title":"Document-level event extraction from Italian crime news using minimal data","authors":"Giovanni Bonisoli , David Vilares , Federica Rollo , Laura Po","doi":"10.1016/j.knosys.2025.113386","DOIUrl":"10.1016/j.knosys.2025.113386","url":null,"abstract":"<div><div>Event extraction from unstructured text is a critical task in natural language processing, often requiring substantial annotated data. This study presents an approach to document-level event extraction applied to Italian crime news, utilizing large language models (LLMs) with minimal labeled data. Our method leverages zero-shot prompting and in-context learning to effectively extract relevant event information. We address three key challenges: (1) identifying text spans corresponding to event entities, (2) associating related spans dispersed throughout the text with the same entity, and (3) formatting the extracted data into a structured JSON. The findings are promising: LLMs achieve an F1-score of approximately 60% for detecting event-related text spans, demonstrating their potential even in resource-constrained settings. This work represents a significant advancement in utilizing LLMs for tasks traditionally dependent on extensive data, showing that meaningful results are achievable with minimal data annotation. Additionally, the proposed approach outperforms several baselines, confirming its robustness and adaptability to various event extraction scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113386"},"PeriodicalIF":7.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multitask learning network with interactive fusion for surgical instrument segmentation","authors":"Mengqiu Song , Yunkai Li , Yanhong Liu, Lei Yang","doi":"10.1016/j.knosys.2025.113370","DOIUrl":"10.1016/j.knosys.2025.113370","url":null,"abstract":"<div><div>The advent of surgical robots has enhanced the capabilities of minimally invasive surgery by providing surgeons with increased precision, dexterity, and control during operations. Accurate segmentation of surgical instruments in endoscopic images is critical to achieving these goals, as it allows surgical robots to precisely identify the instrument position and orientation, thereby reducing the risk of errors and ensuring safer and more successful procedures. However, the complexity of the surgical environment poses significant challenges to the accurate segmentation of surgical instruments, such as mirror reflections of surgical instruments, instrument occlusions, and motion disturbances. To address these issues, this paper presents an innovative multitask learning network with interactive fusion to increase the automatic segmentation accuracy and robustness of surgical instruments in endoscopic images during minimally invasive surgeries. Specifically, to effectively handle the diverse lighting conditions and dynamic environments encountered during surgeries, the proposed model leverages a combination of transformer and convolutional neural network (CNN) architectures to effectively extract both the global and local features of surgical instruments. Moreover, to enhance the boundary perception capability of surgical instruments within the context of endoscopic images, the proposed model incorporates an attention-guided multitask learning structure consisting of a main decoder focused on segmenting the instruments and an auxiliary edge decoder aimed at delineating the boundaries of the instruments. In addition, a dual attention enhancement (DAE) block is introduced, which employs attention mechanisms in different directions to enhance the network’s focus on key features while suppressing irrelevant features. Furthermore, given the diverse nature of surgical tools and their interactions within the surgical site, an atrous pyramid attention (APA) block is introduced to improve the network’s adaptability to the various shapes and sizes of surgical instruments. Experimental evaluations on two surgical instrument datasets demonstrate that the proposed model achieves superior segmentation performance, validating its effectiveness and highlighting its potential to advance the field of robotic-assisted minimally invasive surgery.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113370"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MDF-FND: A dynamic fusion model for multimodal fake news detection","authors":"Hongzhen Lv, Wenzhong Yang, Yabo Yin, Fuyuan Wei, Jiaren Peng, Haokun Geng","doi":"10.1016/j.knosys.2025.113417","DOIUrl":"10.1016/j.knosys.2025.113417","url":null,"abstract":"<div><div>Fake news detection has received increasing attention from researchers in recent years, especially in the area of multimodal fake news detection involving both text and images. However, many previous studies have simply fed the semantic features of both text and image modalities into a binary classifier after applying basic concatenation or attention mechanisms, where these features often contain a significant amount of inherent noise. This, in turn, leads to both intra- and inter-modal uncertainty. In addition, while methods based on simple concatenation of the two modalities have achieved notable results, they often ignore the drawback of applying fixed weights across modalities, which causes some high-impact features to be ignored. To address these issues, we propose a novel semantic-level <strong>m</strong>ultimodal <strong>d</strong>ynamic <strong>f</strong>usion framework for <strong>f</strong>ake <strong>n</strong>ews <strong>d</strong>etection (<strong>MDF-FND</strong>). To the best of our knowledge, this is the first attempt to develop a dynamic fusion framework for semantic-level multimodal fake news detection. Specifically, our model consists of two main components: (1) the <strong>U</strong>ncertainty <strong>E</strong>stimation <strong>M</strong>odule (<strong>UEM</strong>), which is an uncertainty modeling module that uses a multi-head attention mechanism to model intra-modal uncertainty, and (2) the <strong>D</strong>ynamic <strong>F</strong>usion <strong>N</strong>etwork, which is based on Dempster–Shafer evidence theory (<strong>DFN</strong>) and is designed to dynamically integrate the weights of both text and image modalities. To further enhance the dynamic fusion framework, a graph attention network is employed for inter-modal uncertainty modeling before DFN. Extensive experiments have demonstrated the effectiveness of our model across three datasets, with a performance improvement of up to 4% on the Twitter dataset, achieving state-of-the-art performance. We also conducted a systematic ablation study to gain insights into our motivation and architectural design. Our model is publicly available at <span><span>https://github.com/CoisiniStar/MDF-FND</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113417"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Li , Yuting Wang , Yuyan Han , Kaizhou Gao , Junqing Li
{"title":"Q-Learning-Driven Accelerated Iterated Greedy Algorithm for Multi-Scenario Group Scheduling in Distributed Blocking Flowshops","authors":"Zhen Li , Yuting Wang , Yuyan Han , Kaizhou Gao , Junqing Li","doi":"10.1016/j.knosys.2025.113424","DOIUrl":"10.1016/j.knosys.2025.113424","url":null,"abstract":"<div><div>This paper focuses on the distributed blocking flowshop group scheduling problem under multiple processing time scenarios and due dates (DBFGSP_UPT). Initially, a mathematic model is formulated to achieve a balance between the mean and standard deviation of total tardiness across various scenarios, and its correctness is validated via the Gurobi solver. Next, an accelerated iterated greedy algorithm integrated with Q-learning selection mechanism (<span><math><mrow><mi>Q</mi><mi>A</mi><mi>I</mi><mi>G</mi></mrow></math></span>) is proposed. The <span><math><mrow><mi>Q</mi><mi>A</mi><mi>I</mi><mi>G</mi></mrow></math></span> involves: a rapid evaluation method, tailored for the total tardiness criterion by using a hierarchical approach, which is first proposed to significantly reduce the time complexity of the insertion-based method; a self-calibrating parameter method, which dynamically selects appropriate numbers of groups to be destroyed, is designed to improve the diversity of solutions; and a Q-learning mechanism is integrated into the local search strategy framework to facilitate the selection of high-quality local search schemes. Finally, we conduct a comparative analysis across 810 test instances. Comprehensive numerical experiments and comparative analyses demonstrate that the proposed <span><math><mrow><mi>Q</mi><mi>A</mi><mi>I</mi><mi>G</mi></mrow></math></span> surpasses existing state-of-the-art algorithms in terms of the average relative percentage increase.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113424"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fear-constrained personalized and anthropomorphic reinforcement learning for autonomous car-following","authors":"Yufei Zhang, Liang Wu, Zijian Cai, Wenxiao Ma, Xinlun Leng, Wenyuan Sun, Zitong Shan","doi":"10.1016/j.knosys.2025.113433","DOIUrl":"10.1016/j.knosys.2025.113433","url":null,"abstract":"<div><div>Achieving safe, personalized and anthropomorphic driving performance remain challenges for autonomous driving especially in the car-following scenario. “NeuroAI”, which combined neuroscience, brain science and psychology with artificial intelligence (AI), has shown great potential to enhance the performance of AI systems. Drawing inspiration from “NeuroAI”, we present a fear-constrained personalized and anthropomorphic (FCPA) reinforcement learning (RL) for autonomous car-following. Firstly, the fear model of the ego vehicle driver in the car-following scenario is established. And then, the fear thresholds of the drivers with different driving styles are determined through analyzing the collected driving data. Finally, the FCPA-RL algorithm is proposed to realize safe, personalized and anthropomorphic autonomous car-following by keeping the fear within corresponding thresholds and designing the reward function based on the probability density functions (PDF) of time headway (THW). Through experimental tests, we demonstrate that FCPA-RL effectively enhances safety during training, achieves personalized and anthropomorphic autonomous car-following, and exhibits robust generalization across diverse driving scenarios beyond existing approaches. Furthermore, the results also reveal that FCPA-RL not only learns human drivers’ behavioral characteristics but also has potential to surpass human-level driving performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113433"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zuhe Li , Zhenwei Huang , Xiaojiang He , Jun Yu , Haoran Chen , Chenguang Yang , Yushan Pan
{"title":"Representation distribution matching and dynamic routing interaction for multimodal sentiment analysis","authors":"Zuhe Li , Zhenwei Huang , Xiaojiang He , Jun Yu , Haoran Chen , Chenguang Yang , Yushan Pan","doi":"10.1016/j.knosys.2025.113376","DOIUrl":"10.1016/j.knosys.2025.113376","url":null,"abstract":"<div><div>To address the challenges of distribution discrepancies between modalities, underutilization of representations during fusion, and homogenization of fused representations in cross-modal interactions, we introduce a cutting-edge multimodal sentiment analysis (MSA) framework called representation distribution matching interaction to extract and interpret emotional cues from video data. This framework includes a representation distribution matching module that uses an adversarial cyclic translation network. This aligns the representation distributions of nontextual modalities with those of textual modalities, preserving semantic information while reducing distribution gaps. We also developed the dynamic routing interaction module, which combines four distinct components to form a routing interaction space. This setup efficiently uses modality representations for a more effective emotional learning. To combat homogenization, we propose the cross-modal interaction optimization mechanism. It maximizes differences in fused representations and enhances mutual information with target modalities, yielding more discriminative fused representations. Our extensive experiments on the MOSI and MOSEI datasets confirm the effectiveness of our MSA framework.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"316 ","pages":"Article 113376"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peiyao Zhao , Xin Li , Yuangang Pan , Ivor W. Tsang , Mingzhong Wang , Lejian Liao
{"title":"Sharpening deep graph clustering via diverse bellwethers","authors":"Peiyao Zhao , Xin Li , Yuangang Pan , Ivor W. Tsang , Mingzhong Wang , Lejian Liao","doi":"10.1016/j.knosys.2025.113322","DOIUrl":"10.1016/j.knosys.2025.113322","url":null,"abstract":"<div><div>Deep graph clustering has attracted increasing attention in data analysis recently, which leverages the topology structure and attributes of graph to divide nodes into different groups. Most existing deep graph clustering models, however, have compromised performance due to a lack of discriminative representation learning and adequate support for learning diverse clusters. To address these issues, we proposed a Diversity-promoting Deep Graph Clustering (DDGC) model that attains the two essential clustering principles of minimizing the intra-cluster variance while maximizing the inter-cluster variance. Specifically, DDGC iteratively optimizes the node representations and cluster centroids. First, DDGC maximizes the log-likelihood of node representations to obtain cluster centroids, which are subjected to a differentiable diversity regularization term to force the separation among clusters and thus increase inter-cluster variances. Moreover, a minimum entropy-based clustering loss is proposed to sharpen the clustering assignment distributions in order to produce compact clusters, thereby reducing intra-cluster variances. Extensive experimental results demonstrate that DDGC achieves state-of-the-art clustering performance and verifies the effectiveness of each component on common real-world datasets. Experiments also verify that DDGC can learn discriminative node representations and alleviate the <em>over-smoothing</em> issue.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113322"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingting Li , Yunhui Shi , Junbin Gao , Jin Wang , Baocai Yin
{"title":"HKMCNN: Heat Kernel Mesh-Based Convolutional Neural Networks","authors":"Tingting Li , Yunhui Shi , Junbin Gao , Jin Wang , Baocai Yin","doi":"10.1016/j.knosys.2025.113375","DOIUrl":"10.1016/j.knosys.2025.113375","url":null,"abstract":"<div><div>Convolutional neural networks (CNN) have achieved remarkable results in various computer vision and pattern recognition applications. However, in computer graphics and geometry processing, the focus is on non-Euclidean structured meshed surfaces. Since CNNs operate based on Euclidean domains, the fundamental operations of CNNs, such as convolution and pooling, are not well defined in non-Euclidean domains. To address this issue, we propose a novel mesh representation named Heat Kernel Mesh (HKM), which utilizes the heat diffusion on the non-Euclidean domain. The HKM represents a meshed surface as a spatio-temporal graph signal, sampled on the edges of the mesh at each time interval with a Euclidean-like structure. Furthermore, we propose the Heat Kernel Mesh-Based Convolutional Neural Network (HKMCNN), where convolution, pooling, and attention mechanism are designed based on the property of our representation and operate on edges. For the fine-grained classification, we propose distance Heat Kernel Mesh (dHKM) that can identify discriminant features with the HKMCNN to represent a mesh. Extensive experiments on mesh classification and segmentation demonstrate the effectiveness and efficiency of the proposed HKMCNN.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113375"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}