Knowledge-Based Systems最新文献

筛选
英文 中文
Harnessing code domain insights: Enhancing programming Knowledge Tracing with Large Language Models
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-04 DOI: 10.1016/j.knosys.2025.113396
Xinjie Sun , Qi Liu , Kai Zhang , Shuanghong Shen , Lina Yang , Hui Li
{"title":"Harnessing code domain insights: Enhancing programming Knowledge Tracing with Large Language Models","authors":"Xinjie Sun ,&nbsp;Qi Liu ,&nbsp;Kai Zhang ,&nbsp;Shuanghong Shen ,&nbsp;Lina Yang ,&nbsp;Hui Li","doi":"10.1016/j.knosys.2025.113396","DOIUrl":"10.1016/j.knosys.2025.113396","url":null,"abstract":"<div><div>Knowledge Tracing (KT) evaluates students’ mastery of knowledge by analyzing their historical interactions with exercises and predicts their performance on subsequent tasks. Although traditional KT methods have begun to focus on the assessment of programming skills, they are limited by the bottleneck of manually annotating knowledge concepts (KCs) and the inadequacy of constructing relationships between these points. To address this issue, we propose a <em><strong>K</strong>nowledge <strong>T</strong>racing method <strong>E</strong>nhanced by the powerful <strong>C</strong>ode insight capabilities of Large Language Models</em> <strong><em>(CEKT)</em></strong>. Specifically, we designed three different prompt tuning strategies for Large Language Models (LLMs) to comprehensively construct Q-matrices that cover KCs and their relationships across various programming domains and exercises. Additionally, we developed a knowledge graph integrating three dimensions to express the complex relationships between KCs in a fine-grained manner, thereby providing a more accurate assessment of students’ knowledge mastery. Furthermore, we established a graph attention network among KCs to promote interaction between representations of similar syntactic KCs, enhancing the inference capability of students’ programming knowledge state and the effectiveness of KT. Through this approach, we achieved high-quality and interpretable knowledge state inference and demonstrated outstanding performance in predicting student outcomes. Our work highlights a potential future research direction for prompt-tuned LLMs in the KT domain, emphasizing high interpretability and efficiency. For broader research purposes, we have prepared to release our data and source code at <span><span>https://github.com/xinjiesun-ustc/CEKT</span><svg><path></path></svg></span>, encouraging further innovation in this field.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113396"},"PeriodicalIF":7.2,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143783758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SFGNet: Salient-feature-guided real-time building extraction network for remote sensing images
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-02 DOI: 10.1016/j.knosys.2025.113413
Jin Kuang , Dong Liu
{"title":"SFGNet: Salient-feature-guided real-time building extraction network for remote sensing images","authors":"Jin Kuang ,&nbsp;Dong Liu","doi":"10.1016/j.knosys.2025.113413","DOIUrl":"10.1016/j.knosys.2025.113413","url":null,"abstract":"<div><div>Building extraction is crucial for interpreting remote-sensing images. However, existing methods struggle to balance accuracy with inference speed, limiting their support for high concurrency and real-time processing. Although recent approaches have improved segmentation, significant hurdles remain in feature lightweighting, capturing salient features, and ensuring semantic coherence across different characteristics. This paper presents a salient-feature-guided real-time building extraction network (SFGNet), designed to investigate and integrate salient information, such as semantics, details, and borders, thereby improving segmentation performance. First, an effective feature extraction module called Dual-branch Cascade Module (DCM) was developed to extract relevant channel information by learning the shallow details and boundary features of buildings. Additionally, an Offset Feature Alignment Module (OFAM) is designed to minimize the feature offset in both high- and low-frequency connection zones to capture detail and contour edge feature information. A lightweight Context Feature Aggregation Module (CFAM) was implemented in the decoder stage to consolidate local and global features. Finally, a novel hybrid loss function was designed to address the imbalance in single-view, high-density distributions. On the three public datasets (Massachusetts Builds, WHU Aerial Image, and Potsdam Dataset), our model achieves mIoU scores of 75.45%, 89.40%, and 93.16%, respectively. Furthermore, an additional cross-domain experiment on an external untrained real dataset demonstrated outstanding generalization performance. With only 2.397 M parameters, the model reaches an 130.62 FPS, outperforming current state-of-the-art models in terms of both segmentation accuracy and inference speed. These results demonstrate the potential of SFGNet for real-time building segmentation. The Code is available at <span><span>https://github.com/gasking/SFGNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113413"},"PeriodicalIF":7.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TFD-former: Time-frequency domain fusion decoders for effective and robust fault diagnosis under time-varying speeds
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-02 DOI: 10.1016/j.knosys.2025.113410
Ruichen Ma , Jinglong Chen , Yong Feng , Zitong Zhou , Jingsong Xie
{"title":"TFD-former: Time-frequency domain fusion decoders for effective and robust fault diagnosis under time-varying speeds","authors":"Ruichen Ma ,&nbsp;Jinglong Chen ,&nbsp;Yong Feng ,&nbsp;Zitong Zhou ,&nbsp;Jingsong Xie","doi":"10.1016/j.knosys.2025.113410","DOIUrl":"10.1016/j.knosys.2025.113410","url":null,"abstract":"<div><div>Research on fault diagnosis using deep learning methodologies is crucial for ensuring the safety and efficiency of industrial systems. In recent years, numerous analytical approaches have been developed for processing time-domain signals, frequency-domain signals, and time-frequency analysis. This study demonstrates that time-frequency domain fusion decoders exhibit remarkable effectiveness and robustness in fault diagnosis applications. We propose TFDFormer, a novel fault diagnosis framework built on two key designs. First, we introduce a lightweight encoder based on the CNN-Transformer structure, to extract signal features from both time and frequency domains, ensuring precise alignment across these domains. Additionally, we develop a contrastive learning loss function specifically tailored for time-frequency domain embeddings to enhance the model's performance. Second, we employ cross-attention mechanism in the decoder to facilitate efficient feature fusion, enabling seamless integration of domain-specific information. To validate the effectiveness of domain fusion and the diagnostic accuracy of TFDFormer, we evaluate it across three prominent fault diagnosis scenarios: fault diagnosis under time-varying speeds, domain generation fault diagnosis, and fault diagnosis on small-scale datasets. The experiments reveal that time-frequency domain fusion decoders significantly enhance model's capabilities in addressing complex diagnostic tasks. TFDFormer consistently outperforms state-of-the-art methods in terms of accuracy, robustness, generalization, and computational efficiency, demonstrating its superiority in fault diagnosis applications.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"316 ","pages":"Article 113410"},"PeriodicalIF":7.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating signal pairing evaluation metrics with deep learning for wind power forecasting through coupled multiple modal decomposition and aggregation 通过耦合多模态分解和聚合,将信号配对评价指标与深度学习相结合,用于风能预测
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-02 DOI: 10.1016/j.knosys.2025.113394
Yunbing Liu , Jiajun Dai , Guici Chen , Qianlei Cao , Feng Jiang , Wenbo Wang
{"title":"Integrating signal pairing evaluation metrics with deep learning for wind power forecasting through coupled multiple modal decomposition and aggregation","authors":"Yunbing Liu ,&nbsp;Jiajun Dai ,&nbsp;Guici Chen ,&nbsp;Qianlei Cao ,&nbsp;Feng Jiang ,&nbsp;Wenbo Wang","doi":"10.1016/j.knosys.2025.113394","DOIUrl":"10.1016/j.knosys.2025.113394","url":null,"abstract":"<div><div>The accurate prediction of wind power is critical for achieving dynamic equilibrium in economic energy scheduling, storage allocation, and generation planning within power systems. To address the challenges of excessive modal decomposition components and low prediction efficiency resulting from the chaotic, intermittent, and non-stationary nature of wind power signals, a sophisticated prediction method integrating aggregate modal decomposition with a hybrid network model is proposed. Preliminarily, the wind power sequence is decomposed into several primary components using CEEMDAN, and these components are paired to form primary aggregation components, excluding the main trend component. Subsequently, the components of the primary aggregation that exceed the critical threshold of relative sample entropy are re-aggregated and re-decomposed by VMD. Finally, the primary trend component is combined with the prediction of LSTM, the primary aggregation components are estimated through the integration of BiLSTM, and the secondary decomposition components are measured by Attention-BiLSTM. These predictive values are then reconstructed to obtain wind power forecasts. Experimental analysis on a wind power dataset has shown that the proposed approach outperforms other models, significantly enhancing prediction efficiency and accuracy.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113394"},"PeriodicalIF":7.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A non-local sparse unmixing based hyperspectral change detection with unsupervised deep clustering
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-02 DOI: 10.1016/j.knosys.2025.113408
Tianqi Gao , Maoguo Gong , Xiangming Jiang , Yue Zhao , Hao Liu , Yan Pu
{"title":"A non-local sparse unmixing based hyperspectral change detection with unsupervised deep clustering","authors":"Tianqi Gao ,&nbsp;Maoguo Gong ,&nbsp;Xiangming Jiang ,&nbsp;Yue Zhao ,&nbsp;Hao Liu ,&nbsp;Yan Pu","doi":"10.1016/j.knosys.2025.113408","DOIUrl":"10.1016/j.knosys.2025.113408","url":null,"abstract":"<div><div>Hyperspectral images (HSIs) are now widely utilized in change detection (CD) tasks because of their rich spectral signatures. As for detecting and discriminating fine spectral change between different types, hyperspectral unmixing (HU) methods investigate changes into a subpixel-level so as to distinguish the endmember within each pixel. However, current HU models cannot directly utilize the correlation difference information between temporal HSIs during unmixing. This paper proposes a hyperspectral sparse unmixing CD model, which directly extracts the changed endmembers of the difference matrix and uses their abundance to represent the change information. To improve the unmixing accuracy, a non-local mean strategy has been integrated into the HU model, incorporating the non-local spatial information of HSIs. But this comes at the cost of increased computational demands. To further expedite non-local sparse unmixing, we apply an unsupervised deep clustering for homogeneous region segmentation to reduce the search space of non-local mean regularizer, where pixels in the same region possess spectral similarity. A split&amp;merge strategy is employed to infer the number of homogeneous regions. For the generated abundance maps of each endmember, we adopt a self-adaptive abundance truncation strategy to search the optimal threshold for accumulating abundance matrix and retaining the changed regions. Finally, both the experimental results and theoretical analysis confirm the robustness, potential, and validity of our method across multiple HSI datasets.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113408"},"PeriodicalIF":7.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pruning-enabled dynamic influence maximization using antlion optimization
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-01 DOI: 10.1016/j.knosys.2025.113406
Sunil Kumar Meena , Shashank Sheshar Singh , Kuldeep Singh
{"title":"Pruning-enabled dynamic influence maximization using antlion optimization","authors":"Sunil Kumar Meena ,&nbsp;Shashank Sheshar Singh ,&nbsp;Kuldeep Singh","doi":"10.1016/j.knosys.2025.113406","DOIUrl":"10.1016/j.knosys.2025.113406","url":null,"abstract":"<div><div>Influence maximization (IM) is a widely studied topic in social network analysis that gives a reliable basis to select top nodes (seed set) to maximize the influence. IM has several real-world applications, such as advertising, political campaigns, profit maximization, etc. Existing literature suggests several algorithms for IM, including nature-inspired algorithms. In addition, most of the algorithms in IM consider static social networks. Existing studies show that antlion optimization (ALO) is known for its exploration abilities, and existing work in IM does not utilize it. Further, overlap influence reduces the overall influence in the network. To address the mentioned issues, for dynamic social networks, the proposed work suggests a novel algorithm (DALO-IM) for IM using ALO. The suggested strategy utilizes the previous computation during the dynamic traversal of the network. Further, this work suggests a prune-based strategy to overcome the problem of overlap influence. The experiments were conducted on eight datasets. The result analysis shows that the influence using the proposed algorithm is higher than the top-performing benchmark algorithm. Furthermore, this work conducted the ablation study to show the effectiveness of the suggested pruning strategy.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113406"},"PeriodicalIF":7.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiscale motion-aware and spatial–temporal-channel contextual coding network for learned video compression
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-01 DOI: 10.1016/j.knosys.2025.113401
Yiming Wang , Qian Huang , Bin Tang , Xin Li , Xing Li
{"title":"Multiscale motion-aware and spatial–temporal-channel contextual coding network for learned video compression","authors":"Yiming Wang ,&nbsp;Qian Huang ,&nbsp;Bin Tang ,&nbsp;Xin Li ,&nbsp;Xing Li","doi":"10.1016/j.knosys.2025.113401","DOIUrl":"10.1016/j.knosys.2025.113401","url":null,"abstract":"<div><div>Video compression performance is significantly dependent on accurate motion prediction and efficient entropy coding. However, most current learned video compression methods rely on pre-trained optical flow networks or simplistic lightweight models for motion estimation, which fail to fully leverage the spatial–temporal characteristics of video sequences. This often brings higher bit consumption and distortion in reconstructed frames. Additionally, these methods frequently overlook the rich contextual information present within feature channels that could enhance entropy modeling. To address these issues, we propose a motion-aware and spatial–temporal-channel contextual coding-based video compression network (MASTC-VC). Specifically, we introduce a multiscale motion-aware module (MS-MAM) that estimates effective motion information across both spatial and temporal dimensions in a coarse-to-fine manner. We also propose a spatial–temporal-channel contextual module (STCCM) which optimizes entropy coding by exploiting latent representation correlations, leading to bit savings from spatial, temporal and channel perspectives. On top of it, we further introduce an uneven channel grouping scheme to strike a balance between computational complexity and rate–distortion (RD) performance. Extensive experiments demonstrate that MASTC-VC outperforms previous learned models across three benchmark datasets. Notably, our method achieves an average 10.15% BD-rate savings compared to H.265/HEVC (HM-16.20) using the PSNR metric and average 23.93% BD-rate savings against H.266/VVC (VTM-13.2) using the MS-SSIM metric.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"316 ","pages":"Article 113401"},"PeriodicalIF":7.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view feature embedding via shared and specific structural contrastive learning
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-01 DOI: 10.1016/j.knosys.2025.113395
Yi Li , Ruojin Zhou , Ling Jing , Hongjie Zhang
{"title":"Multi-view feature embedding via shared and specific structural contrastive learning","authors":"Yi Li ,&nbsp;Ruojin Zhou ,&nbsp;Ling Jing ,&nbsp;Hongjie Zhang","doi":"10.1016/j.knosys.2025.113395","DOIUrl":"10.1016/j.knosys.2025.113395","url":null,"abstract":"<div><div>Multi-view feature embedding (MvFE) is a powerful technique for addressing the challenges posed by high-dimensional multi-view data. In recent years, contrastive learning (CL) has gained significant attention due to its superior performance. However, existing CL-based methods primarily focus on promoting consistency between any two cross views, thereby overlooking the diversity among views and impeding the simultaneous exploration of both consistency and complementarity. In this study, we propose a novel MvFE method called shared and specific structural contrastive learning (S3CL), which constructs shared and specific losses to capture both shared and specific potential structural information in multi-view data. Additionally, S3CL introduces a novel view-weighting mechanism that adaptively assigns weights to each specific losses, enabling a discriminative treatment of each view based on its uniqueness and importance in the feature embedding process. Moreover, to fully explore the view-specific structures while avoiding the emergence of pseudo-structures, a residual mechanism of incomplete fitting is employed in S3CL. Experimental results on five real-world datasets validate the superior performance of our proposed method compared to existing approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"316 ","pages":"Article 113395"},"PeriodicalIF":7.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Document-level event extraction from Italian crime news using minimal data
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-04-01 DOI: 10.1016/j.knosys.2025.113386
Giovanni Bonisoli , David Vilares , Federica Rollo , Laura Po
{"title":"Document-level event extraction from Italian crime news using minimal data","authors":"Giovanni Bonisoli ,&nbsp;David Vilares ,&nbsp;Federica Rollo ,&nbsp;Laura Po","doi":"10.1016/j.knosys.2025.113386","DOIUrl":"10.1016/j.knosys.2025.113386","url":null,"abstract":"<div><div>Event extraction from unstructured text is a critical task in natural language processing, often requiring substantial annotated data. This study presents an approach to document-level event extraction applied to Italian crime news, utilizing large language models (LLMs) with minimal labeled data. Our method leverages zero-shot prompting and in-context learning to effectively extract relevant event information. We address three key challenges: (1) identifying text spans corresponding to event entities, (2) associating related spans dispersed throughout the text with the same entity, and (3) formatting the extracted data into a structured JSON. The findings are promising: LLMs achieve an F1-score of approximately 60% for detecting event-related text spans, demonstrating their potential even in resource-constrained settings. This work represents a significant advancement in utilizing LLMs for tasks traditionally dependent on extensive data, showing that meaningful results are achievable with minimal data annotation. Additionally, the proposed approach outperforms several baselines, confirming its robustness and adaptability to various event extraction scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113386"},"PeriodicalIF":7.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multitask learning network with interactive fusion for surgical instrument segmentation
IF 7.2 1区 计算机科学
Knowledge-Based Systems Pub Date : 2025-03-31 DOI: 10.1016/j.knosys.2025.113370
Mengqiu Song , Yunkai Li , Yanhong Liu, Lei Yang
{"title":"A multitask learning network with interactive fusion for surgical instrument segmentation","authors":"Mengqiu Song ,&nbsp;Yunkai Li ,&nbsp;Yanhong Liu,&nbsp;Lei Yang","doi":"10.1016/j.knosys.2025.113370","DOIUrl":"10.1016/j.knosys.2025.113370","url":null,"abstract":"<div><div>The advent of surgical robots has enhanced the capabilities of minimally invasive surgery by providing surgeons with increased precision, dexterity, and control during operations. Accurate segmentation of surgical instruments in endoscopic images is critical to achieving these goals, as it allows surgical robots to precisely identify the instrument position and orientation, thereby reducing the risk of errors and ensuring safer and more successful procedures. However, the complexity of the surgical environment poses significant challenges to the accurate segmentation of surgical instruments, such as mirror reflections of surgical instruments, instrument occlusions, and motion disturbances. To address these issues, this paper presents an innovative multitask learning network with interactive fusion to increase the automatic segmentation accuracy and robustness of surgical instruments in endoscopic images during minimally invasive surgeries. Specifically, to effectively handle the diverse lighting conditions and dynamic environments encountered during surgeries, the proposed model leverages a combination of transformer and convolutional neural network (CNN) architectures to effectively extract both the global and local features of surgical instruments. Moreover, to enhance the boundary perception capability of surgical instruments within the context of endoscopic images, the proposed model incorporates an attention-guided multitask learning structure consisting of a main decoder focused on segmenting the instruments and an auxiliary edge decoder aimed at delineating the boundaries of the instruments. In addition, a dual attention enhancement (DAE) block is introduced, which employs attention mechanisms in different directions to enhance the network’s focus on key features while suppressing irrelevant features. Furthermore, given the diverse nature of surgical tools and their interactions within the surgical site, an atrous pyramid attention (APA) block is introduced to improve the network’s adaptability to the various shapes and sizes of surgical instruments. Experimental evaluations on two surgical instrument datasets demonstrate that the proposed model achieves superior segmentation performance, validating its effectiveness and highlighting its potential to advance the field of robotic-assisted minimally invasive surgery.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113370"},"PeriodicalIF":7.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信