Applied Intelligence最新文献

筛选
英文 中文
SDDP: sensitive data detection method for user-controlled data pricing
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-19 DOI: 10.1007/s10489-025-06229-3
Yuchuan Hu, Bitao Hu, Bing Guo, Cheng Dai, Yan Shen
{"title":"SDDP: sensitive data detection method for user-controlled data pricing","authors":"Yuchuan Hu,&nbsp;Bitao Hu,&nbsp;Bing Guo,&nbsp;Cheng Dai,&nbsp;Yan Shen","doi":"10.1007/s10489-025-06229-3","DOIUrl":"10.1007/s10489-025-06229-3","url":null,"abstract":"<p>In the era of big data, there is an urgent need for data sharing, in which data pricing is a crucial issue, because a reasonable price can not only enhance the willingness of users to share data but also promote the progress of data sharing. However, current research is mostly approached from the perspective of data sharing platforms, treating all data equally without sufficient evaluation of sensitive data within shared datasets and personalized perception of privacy from the users themselves. To address this problem, we detected sensitive data in each piece of data and then defined the pricing function based on information entropy and the user’s perception of sensitive information. To enhance the accuracy of sensitive data detection, we integrated an attention mechanism into a pre-trained model to comprehensively represent the samples. Subsequently, on the basis of automatically generating label correlation vectors to calculate the correlation matrix, a graph convolutional neural network was employed to mine the correlation between labels. Furthermore, based on the detection results, information entropy and user ratings are reasonably mapped to prices. Pricing based on user ratings is more suitable for pricing personal data rather than government or institutional data. The experimental results on the dataset of Twitter text sent by users have demonstrated that the average precision of our sensitive data detection model has improved by up to 9.26% compared to comparison models, and SDDP can provide reasonable pricing for samples containing sensitive data and fair compensation for users.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143655367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HKGAT: heterogeneous knowledge graph attention network for explainable recommendation system
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-19 DOI: 10.1007/s10489-025-06446-w
Yongchuan Zhang, Jiahong Tian, Jing Sun, Huirong Chan, Agen Qiu, Cailin Liu
{"title":"HKGAT: heterogeneous knowledge graph attention network for explainable recommendation system","authors":"Yongchuan Zhang,&nbsp;Jiahong Tian,&nbsp;Jing Sun,&nbsp;Huirong Chan,&nbsp;Agen Qiu,&nbsp;Cailin Liu","doi":"10.1007/s10489-025-06446-w","DOIUrl":"10.1007/s10489-025-06446-w","url":null,"abstract":"<div><p>This paper presents the Heterogeneous Knowledge Graph Attention Network (HKGAT) for recommendation systems. As recommendation technology evolves, systems now emphasize diversity, fairness, and explainability alongside accuracy. Traditional methods encounter issues integrating knowledge graphs and lack explainability. HKGAT addresses these by leveraging heterogeneous knowledge graphs. It consists of a heterogeneous information aggregation layer, an attention-aware heterogeneous relation fusion layer, and a prediction layer. First, recommendation data forms a user-item knowledge graph. Then, the aggregation layer collects relation information, followed by the fusion layer integrating it for higher-order feature representations. The prediction layer combines link prediction and recommendation score prediction. Additionally, paths of top-ten results are analyzed and quantified for explainability to optimize ranking. Experiments on self-constructed and Amazon-book datasets show HKGAT outperforms baselines like HetGCN, with significant improvements in Precision, Recall, F1 score, and NDCG@10, and a notable 1.9% gain in NDCG@10 from explainable ranking optimization.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143655368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISL-Net: dual-stream interaction network with task-optimized modules for more accurate, complete iris segmentation and localization
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-19 DOI: 10.1007/s10489-024-05862-8
Lei He, Xiaokai Yang, Jian Zheng, Zhaobang Liu, Xiaoguo Yang
{"title":"ISL-Net: dual-stream interaction network with task-optimized modules for more accurate, complete iris segmentation and localization","authors":"Lei He,&nbsp;Xiaokai Yang,&nbsp;Jian Zheng,&nbsp;Zhaobang Liu,&nbsp;Xiaoguo Yang","doi":"10.1007/s10489-024-05862-8","DOIUrl":"10.1007/s10489-024-05862-8","url":null,"abstract":"<div><p>Iris images captured in uncooperative and unconstrained environments pose significant challenges for iris segmentation and localization owing to factors including high occlusions, specular reflections, motion blur, iris rotation, and off-angle images. To address this challenge, this paper proposes ISL-Net, a multitask segmentation network with a task-optimization module based on deep learning for joint iris segmentation and localization. We developed a dual-stream interactive module (DSIM) that combines dual-stream decoders to facilitate information exchange between tasks without interference. To optimize the iris-segmentation and iris-localization performance, we incorporated a balanced attention module (BAM) and a boundary-enhancement module (BEM) in the skip connections of the respective task stream decoders. The BEM recovers missing boundaries in iris localization, while the BAM focuses on uncertain areas in iris segmentation, enhancing the model’s ability to handle these regions. These modules complement each other, improving overall system performance without interference. The proposed model was evaluated on three challenging iris datasets, outperforming most existing models by achieving e1 index scores of 0.34, 0.79, and 0.61% and average normalized Hausdorff distances (HDs) of 0.7221, 1.1914, and 1.0396%. The results indicate that ISL-Net can generate normalized iris images with simple post-processing, making it suitable for direct application in existing iris-recognition systems.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143645493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scale-cross non-local network with higher-level semantics guidance for smoke segmentation
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-19 DOI: 10.1007/s10489-025-06420-6
Lin Zhang, Jing Wu, Yun Zhao, Feiniu Yuan
{"title":"A scale-cross non-local network with higher-level semantics guidance for smoke segmentation","authors":"Lin Zhang,&nbsp;Jing Wu,&nbsp;Yun Zhao,&nbsp;Feiniu Yuan","doi":"10.1007/s10489-025-06420-6","DOIUrl":"10.1007/s10489-025-06420-6","url":null,"abstract":"<div><p>Smoke semantic segmentation (SSS) is particularly challenging task due to the various patterns of the target itself, which are caused by the characteristics of smoke, like, non-rigid, translucent, fuzzy, environment-sensitive, and so forth. This paper tailor-makes the Scale-Cross Non-Local Network (SCNN) for Smoke Segmentation, aiming to accurately locate the position of smoke in complex scenes. While non-local enjoys the bonus of the excellent competence in modeling long-range contextual dependencies acquired by self-attention, the constraint on single-scale input and the suitability for low-resolution feature erode its capability in information representation. To address these issues, we bespoke a Scale-Cross Non-Local (SCNL) module to better integrate local features with global dependencies. In practical scenes, diverse non-smoke objects sharing similarity with smoke pose great obstacles to accurate location of smoke. As a solution, we design a Pyramid Irregular Convolution (PIC) module containing rich high-level semantic to further refine the feature representation of segmentation task. By supervising classification task, the high-level semantics obtained can guide the segmentation feature to correct semantic errors at the image level and alleviate the issue of between-class similarity. To assess its generalization ability, we empirically evaluate our SCNN on extensive synthetic and real data. Experimental results demonstrate that SCNN achieves state-of-the-art performance, exhibiting enhanced smoke localization, accuracy in boundary detection, and a significant reduction in the false segmentation rate for smoke-like objects.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143655366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ship pipeline defect detection method based on deep learning and transfer fusion of ultrasonic guided wave signals
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-18 DOI: 10.1007/s10489-025-06390-9
Ruoli Tang, Yongzhe Li, Shangyu Zhang
{"title":"Ship pipeline defect detection method based on deep learning and transfer fusion of ultrasonic guided wave signals","authors":"Ruoli Tang,&nbsp;Yongzhe Li,&nbsp;Shangyu Zhang","doi":"10.1007/s10489-025-06390-9","DOIUrl":"10.1007/s10489-025-06390-9","url":null,"abstract":"<p>Ultrasonic guided waves (UGW) hold great promise for structural health monitoring (SHM) of pipeline structures. However, the inherent complexity of pipeline defect features within the UGW makes the intuitive and accurate identification of defects based only on UGW signals challenging. In addition, the existing neural network-based UGW signal recognition methods require a large number of defect waveform samples, which limits their applicability. This study proposes a signal recognition method based on deep learning and sample transfer fusion for the identification of UGW signals in ship pipelines, allowing to accurately detect their potential defects. A time–frequency imaging algorithm for ship pipeline UGW signals is first introduced using the continuous wavelet transform (CWT) to capture their time–frequency characteristics. Leveraging transfer learning, UGW signal samples from various operational scenarios onshore oil pipelines are then fused to pre-train the GoogLeNet convolutional neural network (CNN) model. Finally, the pre-trained GoogLeNet model is fine-tuned with ship pipeline UGW signal samples, which allows to accurately detect the underlying defects. The experimental results demonstrate that the proposed method significantly increases the classification accuracy of ship pipeline defects compared with non-transfer learning methods and time-domain imaging. More precisely, the accuracy increases from 63.3% to 97.3%. Furthermore, the obtained results show that the proposed method has high robustness.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143638465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FA3-Net: feature aggregation and augmentation with attention network for sound event localization and detection
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-18 DOI: 10.1007/s10489-025-06437-x
Chuan Wang, Qinghua Huang
{"title":"FA3-Net: feature aggregation and augmentation with attention network for sound event localization and detection","authors":"Chuan Wang,&nbsp;Qinghua Huang","doi":"10.1007/s10489-025-06437-x","DOIUrl":"10.1007/s10489-025-06437-x","url":null,"abstract":"<div><p>Sound event localization and detection (SELD) aims to identify the category and duration of sound events (SED) while also estimating their respective direction of arrival (DOA). This multi-task problem presents unique challenges, as the features required for SED and DOA tasks are not entirely aligned. Consequently, incomplete feature extraction and suboptimal feature fusion often hinder performance. To address these issues, we propose a feature aggregation and augmentation with attention network (FA3-Net). FA3-Net consists of two main components: the feature aggregation and augmentation with attention (FA3) module and the Conformer module. The FA3 module plays a critical role in fusing and enhancing high-level features, which is specifically designed to efficiently handle the distinct requirements of SED and DOA tasks. It ensures that task-specific features are extracted effectively, while also improving feature discriminability and reducing confusion. The feature aggregation residual block (FAResBlock), a component of the FA3 module, handles task-specific feature aggregation, while the feature augmentation with attention block (FAA block) enhances feature representation across multiple dimensions. The Conformer module is employed to model the temporal sequence, as it excels in capturing both local and global dependencies, making it ideal for comprehensive time sequence analysis. Finally, to overcome data limitations, audio channel swapping (ACS) is employed. Experiments on the STARSS23 dataset, DCASE2021 dataset and L3DAS22 dataset show that FA3-Net significantly outperforms other models in both feature aggregation and augmentation, while also being more efficient and lightweight. The code is available in: https://github.com/wangchuan11111111/FA3-NET</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143638423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved DAB-DETR model for irregular traffic obstacles detection in vision based driving environment perception scenario
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-18 DOI: 10.1007/s10489-025-06440-2
Junchao Yang, Hui Zhang, Yuting Zhou, Zhiwei Guo, Feng Lin
{"title":"Improved DAB-DETR model for irregular traffic obstacles detection in vision based driving environment perception scenario","authors":"Junchao Yang,&nbsp;Hui Zhang,&nbsp;Yuting Zhou,&nbsp;Zhiwei Guo,&nbsp;Feng Lin","doi":"10.1007/s10489-025-06440-2","DOIUrl":"10.1007/s10489-025-06440-2","url":null,"abstract":"<div><p>Machine vision based irregular traffic obstacles recognition plays a pivotal role in the autonomous driving and Advanced Driver Assistance Systems (ADAS) by providing the necessary environment perception capabilities. Traditional models for recognizing irregular traffic obstacles suffer from challenges with small target detection, poor performance in diverse environmental conditions and computational complexity. This work addresses the critical issue of recognizing irregular traffic obstacles in roadway environments. We present an enhanced target detection model based on the Dynamic Anchor Boxes-recognition Transformer (DAB-DETR). The original model’s structure was limited in expressing relative positional information between features due to the reliance on absolute position encoding. To overcome this limitation, the improved DAB-DETR incorporates relative position encoding within the multi-headed self-attention mechanism of the Transformer encoder. Additionally, we propose a novel Average Precision (AP) loss function that unifies classification and localization losses into a single parameterized formula, addressing performance degradation observed in the original model. Experimental results demonstrate significant improvements in detection accuracy for irregular traffic objects, showcasing the effectiveness of the proposed enhancements. According to the testing results, the improved DAB-DETR model’s detection accuracy is 82.00% with Intersection over Union (IoU) equals to 0.5, which is 3.3% better than the original model and 6.20% and 7.71% better than the conventional models, YOLOv5 and Faster R-CNN, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143638422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-dimensional requirements for reinforcement recommendation reasoning
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-18 DOI: 10.1007/s10489-024-05854-8
Yinggang Li, Xiangrong Tong, Zhongming Lv
{"title":"Multi-dimensional requirements for reinforcement recommendation reasoning","authors":"Yinggang Li,&nbsp;Xiangrong Tong,&nbsp;Zhongming Lv","doi":"10.1007/s10489-024-05854-8","DOIUrl":"10.1007/s10489-024-05854-8","url":null,"abstract":"<div><p>Personalized recommendation systems not only need to improve the accuracy of recommendations, but also need to focus on the variety and novelty of recommendations to improve user satisfaction. Currently, most of the existing recommendation systems focus on improving the accuracy and diversity of recommendation items, however, they usually do not consider the original user needs, and the potential relationship between diversity and novelty is not deeply explored. In addition to accuracy and diversity, we also consider novelty, and analyze the relationship between diversity and novelty (same place and different place), and propose an explainable recommendation system that integrates multiple (multidimensional) requirements such as accuracy, diversity, and novelty. The model combines semantic relations of knowledge graphs and multi-hop inference so as to analyze and consider the diversity and novelty requirements of users. Meanwhile, a recurrent neural network is used to construct a temporal multi-label classification network to predict users’ multidimensional demands and capture the dependencies between diversity and novelty demands. Finally, a composite reward function, including accuracy reward, diversity reward and novelty reward, is designed to implement a multi-demand, multi-decision recommendation method. Experiments are conducted on three real-world datasets, and the experimental results show that the model can guarantee the accuracy while improving the diversity and novelty of recommended items.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143638462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated learning-based road surveillance system in distributed CCTV environment: Pedestrian fall recognition using spatio-temporal attention networks
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-18 DOI: 10.1007/s10489-025-06451-z
Byeonghun Kim, Jaegyun Im, Byeongjoon Noh
{"title":"Federated learning-based road surveillance system in distributed CCTV environment: Pedestrian fall recognition using spatio-temporal attention networks","authors":"Byeonghun Kim,&nbsp;Jaegyun Im,&nbsp;Byeongjoon Noh","doi":"10.1007/s10489-025-06451-z","DOIUrl":"10.1007/s10489-025-06451-z","url":null,"abstract":"<div><p>Intelligent CCTV systems are highly effective in monitoring pedestrian and vehicular traffic and identifying anomalies in the roadside environment. In particular, it is necessary to develop an effective recognition system to address the problem of pedestrian falls, which is a major cause of injury in road traffic environments. However, the existing systems have challenges such as communication constraints and performance instability. In this paper, we propose a novel fall recognition system based on Federated Learning (FL) to solve these challenges. The proposed system utilizes a GAT combined with LSTM and attention layers to extract spatio-temporal features, which can more accurately identify pedestrian falls. Each road CCTV works as an independent client to generate local data, and the server aggregates these models to learn a global model. This ensures robust operation in different views and environments, and solves the bottleneck of data communication and security challenges. We validated the feasibility and applicability of the FL-based fall recognition method by implementing the prototype and applying it to the UP-FALL benchmark dataset, which is widely used for fall recognition. Code has been made available at: https://github.com/Kim-Byeong-Hun/Fed-PFR.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143638464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale dual-stream visual feature extraction and graph reasoning for visual question answering
IF 3.4 2区 计算机科学
Applied Intelligence Pub Date : 2025-03-18 DOI: 10.1007/s10489-025-06325-4
Abdulganiyu Abdu Yusuf, Chong Feng, Xianling Mao, Xinyan Li, Yunusa Haruna, Ramadhani Ally Duma
{"title":"Multi-scale dual-stream visual feature extraction and graph reasoning for visual question answering","authors":"Abdulganiyu Abdu Yusuf,&nbsp;Chong Feng,&nbsp;Xianling Mao,&nbsp;Xinyan Li,&nbsp;Yunusa Haruna,&nbsp;Ramadhani Ally Duma","doi":"10.1007/s10489-025-06325-4","DOIUrl":"10.1007/s10489-025-06325-4","url":null,"abstract":"<div><p>Recent advancements in deep learning algorithms have significantly expanded the capabilities of systems to handle vision-to-language (V2L) tasks. Visual question answering (VQA) presents challenges that require a deep understanding of visual and language content to perform complex reasoning tasks. The existing VQA models often rely on grid-based or region-based visual features, which capture global context and object-specific details, respectively. However, balancing the complementary strengths of each feature type while minimizing fusion noise remains a significant challenge. This study propose a multi-scale dual-stream visual feature extraction method that combines grid and region features to enhance both global and local visual feature representations. Also, a visual graph relational reasoning (VGRR) approach is proposed to further improve reasoning by constructing a graph that models spatial and semantic relationships between visual objects, using Graph Attention Networks (GATs) for relational reasoning. To enhance the interaction between visual and textual modalities, we further propose a cross-modal self-attention fusion strategy, which enables the model to focus selectively on the most relevant parts of both the image and the question. The proposed model is evaluated on the VQA 2.0 and GQA benchmark datasets, demonstrating competitive performance with significant accuracy improvements compared to state-of-the-art methods. Ablation studies confirm the effectiveness of each module in enhancing visual-textual understanding and answer prediction.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143638463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信