Qiule Sun, Jianxin Zhang, Bingbing Zhang, Peihua Li
{"title":"Instance-aware context with mutually guided vision-language attention for referring image segmentation","authors":"Qiule Sun, Jianxin Zhang, Bingbing Zhang, Peihua Li","doi":"10.1007/s10489-025-06851-1","DOIUrl":"10.1007/s10489-025-06851-1","url":null,"abstract":"<div><p>Referring image segmentation, which integrates both visual and linguistic modalities, represents a forefront challenge in cross-modal visual research. Traditional approaches generally fuse linguistic features with visual data to generate multi-modal representations for mask decoding. However, these methods often mistakenly segment visually prominent entities rather than the specific region indicated by the referring expression, as the visual context tends to overshadow the multi-modal features. To address this, we introduce IMNet, a novel referring image segmentation framework that harnesses the Contrastive Language-Image Pre-training (CLIP) model and incorporates a mutually guided vision-language attention mechanism to enhance accuracy in identifying the referring mask. Specifically, our mutually guided vision-language attention mechanism consists of language-guided attention and vision-guided attention, which model bi-directional relationships between vision and linguistic features. Additionally, to accurately segment instances based on referring expressions, we develop an instance-aware context module within the decoder that focuses on learning instance-specific features. This module connects instance prototypes with corresponding features, using linearly weighted prototypes for final prediction. We evaluate the proposed method on three publicly available datasets, i.e., RefCOCO, RefCOCO+, and G-Ref. Comparisons with previous methods demonstrates that our approach achieves competitive performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144920476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiwei Cai, Nuoying Xu, Linqin Cai, Bo Ren, Yu Xiong
{"title":"Educational knowledge graph based intelligent question answering for automatic control disciplines","authors":"Zhiwei Cai, Nuoying Xu, Linqin Cai, Bo Ren, Yu Xiong","doi":"10.1007/s10489-025-06847-x","DOIUrl":"10.1007/s10489-025-06847-x","url":null,"abstract":"<div><p>With the further development of education informatization, Educational Knowledge Graph (EKG) based intelligent Question Answering (KGQA) has attracted significant attention in smart education. However, current educational KGQA faces enormous challenges, such as the incomplete questions from students, the dispersed knowledge from EKG, and the scarce and imbalanced dataset. In this paper, a novel educational KGQA model was proposed for answering student’s questions on automatic control disciplines. Firstly, a topic entity detection algorithm was constructed based on BERT-BiLSTM-CRF and domain dictionary, and an intention recognition algorithm was built based on BERT and TextCNN to accurately locate the topic entity by formulating entity priority, entity completion rules, and similarity calculation. Then, a custom weighted cross-entropy loss function (CCL) was designed to alleviate the influence of imbalanced samples in the training dataset on the model classifier. In addition, the first Chinese dataset for educational KGQA in automatic control disciplines (ACKGQA) was constructed. Finally, extensive experiments are performed to evaluate the effectiveness and generalizations of the proposed KGQA model on the ACKGQA dataset and five benchmark public datasets. The proposed KGQA obtains the recognition precision of 87.5% and the recall of 86.25% on the ACKGQA dataset and exhibits better overall performance on other five benchmark datasets. Experimental results demonstrate that our educational KGQA model can achieve outstanding performance when facing the challenges posed by imbalanced datasets inherent in educational knowledge graphs.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144920461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balanced Loss Function for Long-tailed Semi-supervised Ship Detection","authors":"Li-Ying Hao, Jia-Rui Yang, Yunze Zhang","doi":"10.1007/s10489-025-06838-y","DOIUrl":"10.1007/s10489-025-06838-y","url":null,"abstract":"<div><p>Semi-supervised learning (SSL) has significantly reduced the reliance of the ship detection network on labeled images. However, the more realistic and challenging issue of long-tailed distribution in SSL remains largely unexplored. While most existing methods address this issue at the instance level through reweighting or resampling techniques, their performance is significantly limited by their dependence on biased backbone representations. To overcome this limitation, we propose a Balanced Loss function (Bal Loss). Our approach consists of three key components. First, we introduce the BaCon Loss, which computes class-wise feature centers as positive anchors and selects negative anchors through a simple yet effective mechanism. Second, we posit an assumption that the normalized features in contrastive learning follow a mixture of von Mises-Fisher (vMF) distributions in the unit space. This assumption allows us to estimate the distribution parameters using only the first sample moment, which can be efficiently computed in an online manner across different batches. Finally, we incorporate a Jitter-Bagging module, adapted from prior literature, to provide precise localization information, thereby refining bounding box predictions. Extensive experiments demonstrate the efficacy of Bal Loss, achieving SOTA results on ship datasets with a 3.9 improvement over the baseline. Notably, our method attains an <span>(AP^{r})</span> of 44.1 on the ShipRSImageNet dataset, underscoring its robust detection capabilities.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144920462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zengzhao Chen, Fumei Ma, Hai Liu, Wenkai Huang, Tingting Liu
{"title":"MDT: A multiscale differencing transformer with sequence feature relationship mining for robust action recognition","authors":"Zengzhao Chen, Fumei Ma, Hai Liu, Wenkai Huang, Tingting Liu","doi":"10.1007/s10489-025-06861-z","DOIUrl":"10.1007/s10489-025-06861-z","url":null,"abstract":"<div><p>Skeleton-based action recognition, which analyzes joint coordinates and bone connections to classify human actions, is important in understanding and analyzing human dynamic behaviors. However, actions in complex scenes have a high degree of similarity and variability, with the dynamic changes in human skeletons and subtle temporal variations in particular posing significant challenges to the accuracy and robustness of action recognition systems. To mitigate these challenges, we propose a novel multiscale differencing transformer (MDT) with sequence feature relationship mining for robust action recognition. MDT effectively mines inter-frame timing information and feature distribution differences across multiple scales, enabling a deeper understanding of the nuances between actions. Specifically, we first propose multiscale differential self-attention to handle the need for understanding action changes across multiple time scales, improving the capacity of the model to effectively capture the global and local dynamic features of actions. Then, we introduce a sequence feature relationship mining module to address complex data patterns in scenes that may span multiple sequences, exhibiting both similar and distinct characteristics. By utilizing coarse- and fine-grained sequence information, this module empowers the model to recognize intricate data patterns. On the NTU RGB+D 60 dataset, the proposed MDT model outperforms the recent STAR-Transformer by 1.6% on the Cross-Subject (CS) setting and 1.1% on the Cross-View (CV) setting, demonstrating its consistent effectiveness across different evaluation protocols.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144920518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiview unsupervised domain adaptation through consensus augmented masking for subspace alignment","authors":"Chenyang Zhu, Weibin Luo, Yunxin Xie, Lipei Fu","doi":"10.1007/s10489-025-06834-2","DOIUrl":"10.1007/s10489-025-06834-2","url":null,"abstract":"<div><p>Unsupervised Domain Adaptation (UDA) focuses on bridging the gap between source and target domain distributions. Existing UDA approaches often struggle to capture the diverse contextual dependencies required to address ambiguities in visual feature representations. To overcome these challenges, we propose a framework called Consensus Augmented Masking for Subspace Alignment (CAMSA) that leverages multiview representations to enhance contextual diversity and establish a consensus subspace for improved domain alignment. Firstly, multiple models are independently trained with distinct masking augmentations to ensure prediction consistency and extract specialized multiview features, each capturing a unique contextual perspective. These multiview features are unified into a low-rank structure via sparse subspace representation, enabling cross-view consensus and robust domain alignment. The unified representation is further optimized by constructing a consensus affinity matrix, which facilitates the learning of a projection matrix to embed multiview features into a latent subspace. Within this latent space, source domain prototypes and <i>k</i>-means clustering on the target domain are used to estimate conditional probabilities for downstream tasks. Extensive empirical evaluations on standard benchmark datasets highlight the exceptional performance of CAMSA, consistently surpassing state-of-the-art UDA methods across a variety of architectures and configurations, underscoring the importance of leveraging diverse contextual views for robust domain alignment.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144920519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Li, Chengrun Jiang, Min Li, Jiachen Li, Delong Han, Mingle Zhou
{"title":"Industrial-application-oriented 2D image and 3D object anomaly detection technology: a comprehensive review","authors":"Gang Li, Chengrun Jiang, Min Li, Jiachen Li, Delong Han, Mingle Zhou","doi":"10.1007/s10489-025-06689-7","DOIUrl":"10.1007/s10489-025-06689-7","url":null,"abstract":"<div><p>With the rapid development of deep learning technology, industrial anomaly detection technology has significantly improved its ability to handle large-scale images and point clouds. It has gradually been applied to complex industrial environments. However, current reviews of anomaly detection technology are often technology-oriented, and there is still a need for a systematic classification for practical industrial scenarios. Given these considerations, we will summarize and categorize the latest anomaly detection technologies from the perspective of specific industrial application scenarios, including 2D image anomaly detection, 3D object anomaly detection, and datasets. This application-oriented classification method can more effectively meet the practical needs of anomaly detection tasks in industrial production. Furthermore, we contribute to anomaly detection technology by delivering a comprehensive analysis of the current state and challenges in industrial anomaly detection, offering insights into the customization of deep learning for real-world industrial applications, and presenting an outlook for future research directions.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ExQUAL: an explainable quantum machine learning classifier","authors":"Karuna Kadian, Sunita Garhwal, Ajay Kumar","doi":"10.1007/s10489-025-06732-7","DOIUrl":"10.1007/s10489-025-06732-7","url":null,"abstract":"<div><p>Quantum machine learning (QML) holds the potential to solve complex tasks that classical machine learning is unable to handle. QML is a promising and emerging field which is in the state of continuous development. This necessitates a deeper comprehension of the intricate black-box nature of the quantum machine learning models. To address this challenge, the incorporation of explainable artificial intelligence becomes imperative. This paper introduces a novel approach - Explainable Quantum Classifier (ExQUAL) to integrate the Local Interpretable Model-agnostic Explanations (LIME) framework and SHapley Additive exPlanations (SHAP) with the Pegasos Quantum Support Vector Machine (QSVM) model for classification tasks. ExQUAL provides a methodology to integrate these frameworks with both binary and multi-class classification tasks and provides both local and global explanations. This approach seeks to enhance transparency and interpretability while advancing the applicability and trustworthiness of quantum machine learning methodologies.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christos Chadoulos, John Theocharis, Andreas Symeonidis, Serafeim Moustakidis
{"title":"Knee-cartilage segmentation from MR images using Multi-view Hypergraph Convolutional Neural Networks","authors":"Christos Chadoulos, John Theocharis, Andreas Symeonidis, Serafeim Moustakidis","doi":"10.1007/s10489-025-06808-4","DOIUrl":"10.1007/s10489-025-06808-4","url":null,"abstract":"<div><p>Leveraging the increased capacities of hypergraphs to model complex data structures, we propose in this article the Multi-view Hyper-Graph Convolutional Network <i>(MVHGCN)</i> to yield automated knee-joint cartilage segmentations from MRIs. The main properties of our approach are presented as follows: 1) Node features are obtained from multi-view <i>(MV)</i> acquisitions, corresponding to different feature extractors or image modalities. 2) Node embeddings are generated using a distributive <i>MV</i> convolution scheme which combines the various view-specific convolutions. These results are aggregated via an attention-based fusion module to automatically learn the weights of the different views. 3) Our model integrates both local and global level learning, simultaneously. Local hypergraph convolutions explore the relationships across the spatially aligned node libraries, while global hypergraph convolutions search for global affinities between nodes located at different positions within the image. 4) We propose two different blending schemes to combine local and global convolutions, namely, the cross-talk <i>(CT)</i> and the collaborative <i>(COL)</i> blending units, respectively. Using these units as building blocks, we construct the <i>MVHGCN</i> model, a deep network with enhanced feature representation and learning capabilities. The suggested segmentation method is evaluated on the publicly available Osteoarthritis Initiative <i>(OAI)</i> cohort. Specifically, we have designed a thorough experimental setup, including parameter sensitivity analysis and comparative results against a series of existing traditional methods, deep <i>CNN</i> models, and graph convolutional networks. The results show that <i>MVHGCN</i> outperforms the competing methods, achieving an overall cartilage segmentation score of <span>(mathcal {DSC} = 95.81%)</span> and <span>(mathcal {DSC} = 96.33%)</span>, for the <i>CT</i> and the <i>COL</i> blending, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06808-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deming Xu, Yan Wang, Xiang Liu, Hao Ma, Zhicheng Ji
{"title":"Correction to: Dynamic preventive maintenance strategy for a heterogeneous multi-unit redundant system: A deep reinforcement learning approach with weighted network estimator","authors":"Deming Xu, Yan Wang, Xiang Liu, Hao Ma, Zhicheng Ji","doi":"10.1007/s10489-025-06804-8","DOIUrl":"10.1007/s10489-025-06804-8","url":null,"abstract":"","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep reinforcement learning with graph attention mechanism for vehicle routing problem with time windows","authors":"Fan Zhang, Huiling Hu, Yuqian Zhao","doi":"10.1007/s10489-025-06829-z","DOIUrl":"10.1007/s10489-025-06829-z","url":null,"abstract":"<div><p>As the logistics industry expands, the complexity of vehicle routing problems, particularly those with time window constraints, increases with the growing demand for services. The challenge of vehicle routing problems with time windows (VRPTW) lies in efficiently scheduling a fleet of vehicles to service a set of customers within specified time frames. This study introduces a deep reinforcement learning approach based on attention mechanisms to optimize vehicle routing and scheduling, aiming to meet specific time window requirements of customers while effectively reducing travel distances and costs, thereby enhancing the efficiency of logistics delivery. This method models the problem as a Markov decision process, defines actions, states, and rewards, and uses reinforcement learning for training to extract node information features and generate preliminary solutions. The model can focus on key information and optimize strategy selection by introducing an encoding-decoding structure and attention map neural network. Then, the large neighborhood search algorithm is used to iterative optimize the initial solution to obtain the optimal solution. The model is trained and tested on the Solomon data set. The experimental results show that the model is significantly better than other methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144905155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}