Visual Computing for Industry Biomedicine and Art最新文献

筛选
英文 中文
Graph neural network-tracker: a graph neural network-based multi-sensor fusion framework for robust unmanned aerial vehicle tracking. 图神经网络跟踪器:一种基于图神经网络的多传感器融合框架,用于鲁棒无人机跟踪。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-07-16 DOI: 10.1186/s42492-025-00200-2
Karim Dabbabi, Tijeni Delleji
{"title":"Graph neural network-tracker: a graph neural network-based multi-sensor fusion framework for robust unmanned aerial vehicle tracking.","authors":"Karim Dabbabi, Tijeni Delleji","doi":"10.1186/s42492-025-00200-2","DOIUrl":"10.1186/s42492-025-00200-2","url":null,"abstract":"<p><p>Unmanned aerial vehicle (UAV) tracking is a critical task in surveillance, security, and autonomous navigation applications. In this study, we propose graph neural network-tracker (GNN-tracker), a novel GNN-based UAV tracking framework that effectively integrates graph-based spatial-temporal modelling, Transformer-based feature extraction, and multi-sensor fusion to enhance tracking robustness and accuracy. Unlike traditional tracking approaches, GNN-tracker dynamically constructs a spatiotemporal graph representation, improving identity consistency and reducing tracking errors under OCC-heavy scenarios. Experimental evaluations on optical, thermal, and fused UAV datasets demonstrate the superiority of GNN-tracker (fused) over state-of-the-art methods. The proposed model achieves multiple object tracking accuracy (MOTA) scores of 91.4% (fused), 89.1% (optical), and 86.3% (thermal), surpassing TransT by 8.9% in MOTA and 7.7% in higher order tracking accuracy (HOTA). The HOTA scores of 82.3% (fused), 80.1% (optical), and 78.7% (thermal) validate its strong object association capabilities, while its frames per second of 58.9 (fused), 56.8 (optical), and 54.3 (thermal) ensures real-time performance. Additionally, ablation studies confirm the essential role of graph-based modelling and multi-sensor fusion, with performance drops of up to 8.9% in MOTA when these components are removed. Thus, GNN-tracker (fused) offers a highly accurate, robust, and efficient UAV tracking solution, effectively addressing real-world challenges across diverse environmental conditions and multiple sensor modalities.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"18"},"PeriodicalIF":3.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12267811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144643753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Placenta segmentation redefined: review of deep learning integration of magnetic resonance imaging and ultrasound imaging. 胎盘分割的重新定义:磁共振成像和超声成像的深度学习集成综述。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-07-15 DOI: 10.1186/s42492-025-00197-8
Asmaa Jittou, Khalid El Fazazy, Jamal Riffi
{"title":"Placenta segmentation redefined: review of deep learning integration of magnetic resonance imaging and ultrasound imaging.","authors":"Asmaa Jittou, Khalid El Fazazy, Jamal Riffi","doi":"10.1186/s42492-025-00197-8","DOIUrl":"10.1186/s42492-025-00197-8","url":null,"abstract":"<p><p>Placental segmentation is critical for the quantitative analysis of prenatal imaging applications. However, segmenting the placenta using magnetic resonance imaging (MRI) and ultrasound is challenging because of variations in fetal position, dynamic placental development, and image quality. Most segmentation methods define regions of interest with different shapes and intensities, encompassing the entire placenta or specific structures. Recently, deep learning has emerged as a key approach that offer high segmentation performance across diverse datasets. This review focuses on the recent advances in deep learning techniques for placental segmentation in medical imaging, specifically MRI and ultrasound modalities, and cover studies from 2019 to 2024. This review synthesizes recent research, expand knowledge in this innovative area, and highlight the potential of deep learning approaches to significantly enhance prenatal diagnostics. These findings emphasize the importance of selecting appropriate imaging modalities and model architectures tailored to specific clinical scenarios. In addition, integrating both MRI and ultrasound can enhance segmentation performance by leveraging complementary information. This review also discusses the challenges associated with the high costs and limited availability of advanced imaging technologies. It provides insights into the current state of placental segmentation techniques and their implications for improving maternal and fetal health outcomes, underscoring the transformative impact of deep learning on prenatal diagnostics.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"17"},"PeriodicalIF":3.2,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active interaction strategy generation for human-robot collaboration based on trust. 基于信任的人机协作主动交互策略生成。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-06-23 DOI: 10.1186/s42492-025-00198-7
Yujie Guo, Pengfei Yi, Xiaopeng Wei, Dongsheng Zhou
{"title":"Active interaction strategy generation for human-robot collaboration based on trust.","authors":"Yujie Guo, Pengfei Yi, Xiaopeng Wei, Dongsheng Zhou","doi":"10.1186/s42492-025-00198-7","DOIUrl":"10.1186/s42492-025-00198-7","url":null,"abstract":"<p><p>In human-robot collaborative tasks, human trust in robots can reduce resistance to them, thereby increasing the success rate of task execution. However, most existing studies have focused on improving the success rate of human-robot collaboration (HRC) rather than on enhancing collaboration efficiency. To improve the overall collaboration efficiency while maintaining a high success rate, this study proposes an active interaction strategy generation for HRC based on trust. First, a trust-based optimal robot strategy generation method was proposed to generate the robot's optimal strategy in a HRC. This method employs a tree to model the HRC process under different robot strategies and calculates the optimal strategy based on the modeling results for the robot to execute. Second, the robot's performance was evaluated to calculate human's trust in a robot. A robot performance evaluation method based on a visual language model was also proposed. The evaluation results were input into the trust model to compute human's current trust. Finally, each time an object operation was completed, the robot's performance evaluation and optimal strategy generation methods worked together to automatically generate the optimal strategy of the robot for the next step until the entire collaborative task was completed. The experimental results demonstrates that this method significantly improve collaborative efficiency, achieving a high success rate in HRC.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"16"},"PeriodicalIF":3.2,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12185789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144477081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Avatars in the educational metaverse. 教育虚拟世界中的化身。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-06-10 DOI: 10.1186/s42492-025-00196-9
Md Zabirul Islam, Ge Wang
{"title":"Avatars in the educational metaverse.","authors":"Md Zabirul Islam, Ge Wang","doi":"10.1186/s42492-025-00196-9","DOIUrl":"10.1186/s42492-025-00196-9","url":null,"abstract":"<p><p>Avatars in the educational metaverse are revolutionizing the learning process by providing interactive and effective learning experiences. These avatars enable students to engage in realistic scenarios, work in groups, and develop essential skills using adaptive and intelligent technologies. The purpose of this review is to evaluate the contribution of avatars to education. It investigated the use of avatars to enhance learning by offering individualized experiences and supporting collaborative group activities in virtual environments. It also analyzed the recent progress in artificial intelligence, especially natural language processing and generative models, which have significantly improved avatar capabilities. In addition, it reviewed their use in customized learning, contextual teaching, and virtual simulations to improve student participation and achievement. This study also highlighted issues impacting its implementation, including data security, ethical concerns, and limited infrastructure. The paper ends with implications and recommendations for future research in this field.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"15"},"PeriodicalIF":3.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12151956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144259048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radiographic prediction model based on X-rays predicting anterior cruciate ligament function in patients with knee osteoarthritis. 基于x线预测膝关节骨关节炎患者前交叉韧带功能的x线预测模型。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-06-06 DOI: 10.1186/s42492-025-00195-w
Guanghan Gao, Yaonan Zhang, Lei Shi, Lin Wang, Fei Wang, Qingyun Xue
{"title":"Radiographic prediction model based on X-rays predicting anterior cruciate ligament function in patients with knee osteoarthritis.","authors":"Guanghan Gao, Yaonan Zhang, Lei Shi, Lin Wang, Fei Wang, Qingyun Xue","doi":"10.1186/s42492-025-00195-w","DOIUrl":"10.1186/s42492-025-00195-w","url":null,"abstract":"<p><p>Knee osteoarthritis (KOA) is a prevalent chronic condition in the elderly and is often associated with instability caused by anterior cruciate ligament (ACL) degeneration. The functional integrity of ACL is crucial for the diagnosis and treatment of KOA. Radiographic imaging is a practical diagnostic tool for predicting the functional status of the ACL. However, the precision of the current evaluation methodologies remains suboptimal. Consequently, we aimed to identify additional radiographic features from X-ray images that could predict the ACL function in a larger cohort of patients with KOA. A retrospective analysis was conducted on 272 patients whose ACL function was verified intraoperatively between October 2021 and October 2024. The patients were categorized into ACL-functional and ACL-dysfunctional groups. Using least absolute shrinkage and selection operator regression and logistic regression, four significant radiographic predictors were identified: location of the deepest wear on the medial tibial plateau (middle and posterior), wear depth in the posterior third of the medial tibial plateau (> 1.40 mm), posterior tibial slope (PTS > 7.90°), and static anterior tibial translation (> 4.49 mm). A clinical prediction model was developed and visualized using a nomogram with calibration curves and receiver operating characteristic analysis to confirm the model performance. The prediction model demonstrated great discriminative ability, showing area under the curve values of 0.831 (88.4% sensitivity, 63.8% specificity) and 0.907 (86.1% sensitivity, 82.2% specificity) in the training and validation cohorts, respectively. Consequently, the authors established an efficient approach for accurate evaluation of ACL function in KOA patients.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"14"},"PeriodicalIF":3.2,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12143998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence-assisted diagnosis of early allograft dysfunction based on ultrasound image and data. 基于超声图像和数据的早期异体移植物功能障碍的人工智能辅助诊断。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-05-12 DOI: 10.1186/s42492-025-00192-z
Yaqing Meng, Mingyang Wang, Ningning Niu, Haoyan Zhang, Jinghan Yang, Guoying Zhang, Jing Liu, Ying Tang, Kun Wang
{"title":"Artificial intelligence-assisted diagnosis of early allograft dysfunction based on ultrasound image and data.","authors":"Yaqing Meng, Mingyang Wang, Ningning Niu, Haoyan Zhang, Jinghan Yang, Guoying Zhang, Jing Liu, Ying Tang, Kun Wang","doi":"10.1186/s42492-025-00192-z","DOIUrl":"10.1186/s42492-025-00192-z","url":null,"abstract":"<p><p>Early allograft dysfunction (EAD) significantly affects liver transplantation prognosis. This study evaluated the effectiveness of artificial intelligence (AI)-assisted methods in accurately diagnosing EAD and identifying its causes. The primary metric for assessing the accuracy was the area under the receiver operating characteristic curve (AUC). Accuracy, sensitivity, and specificity were calculated and analyzed to compare the performance of the AI models with each other and with radiologists. EAD classification followed the criteria established by Olthoff et al. A total of 582 liver transplant patients who underwent transplantation between December 2012 and June 2021 were selected. Among these, 117 patients (mean age 33.5 ± 26.5 years, 80 men) were evaluated. The ultrasound parameters, images, and clinical information of patients were extracted from the database to train the AI model. The AUC for the ultrasound-spectrogram fusion network constructed from four ultrasound images and medical data was 0.968 (95%CI: 0.940, 0.991), outperforming radiologists by 30% for all metrics. AI assistance significantly improved diagnostic accuracy, sensitivity, and specificity (P < 0.050) for both experienced and less-experienced physicians. EAD lacks efficient diagnosis and causation analysis methods. The integration of AI and ultrasound enhances diagnostic accuracy and causation analysis. By modeling only images and data related to blood flow, the AI model effectively analyzed patients with EAD caused by abnormal blood supply. Our model can assist radiologists in reducing judgment discrepancies, potentially benefitting patients with EAD in underdeveloped regions. Furthermore, it enables targeted treatment for those with abnormal blood supply.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"13"},"PeriodicalIF":3.2,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12069173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144004074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph visualization efficiency of popular web-based libraries. 流行的网络图书馆的图形可视化效率。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-05-08 DOI: 10.1186/s42492-025-00193-y
Xin Zhao, Xuan Wang, Xianzhe Zou, Huiming Liang, Genghuai Bai, Ning Zhang, Xin Huang, Fangfang Zhou, Ying Zhao
{"title":"Graph visualization efficiency of popular web-based libraries.","authors":"Xin Zhao, Xuan Wang, Xianzhe Zou, Huiming Liang, Genghuai Bai, Ning Zhang, Xin Huang, Fangfang Zhou, Ying Zhao","doi":"10.1186/s42492-025-00193-y","DOIUrl":"https://doi.org/10.1186/s42492-025-00193-y","url":null,"abstract":"<p><p>Web-based libraries, such as D3.js, ECharts.js, and G6.js, are widely used to generate node-link graph visualizations. These libraries allow users to call application programming interfaces (APIs) without identifying the details of the encapsulated techniques such as graph layout algorithms and graph rendering methods. Efficiency requirements, such as visualizing a graph with 3k nodes and 4k edges within 1 min at a frame rate of 30 fps, are crucial for selecting a proper library because libraries generally present different characteristics owing to the diversity of encapsulated techniques. However, existing studies have mainly focused on verifying the advantages of a new layout algorithm or rendering method from a theoretical viewpoint independent of specific web-based libraries. Their conclusions are difficult for end users to understand and utilize. Therefore, a trial-and-error selection process is required. This study addresses this gap by conducting an empirical experiment to evaluate the performance of web-based libraries. The experiment involves popular libraries and hundreds of graph datasets covering node scales from 100 to 200k and edge-to-node ratios from 1 to 10 (including complete graphs). The experimental results are the time costs and frame rates recorded using the libraries to visualize the datasets. The authors analyze the performance characteristics of each library in depth based on the results and organize the results and findings into application-oriented guidelines. Additionally, they present three usage cases to illustrate how the guidelines can be applied in practice. These guidelines offer user-friendly and reliable recommendations, aiding users in quickly selecting the desired web-based libraries based on their specific efficiency requirements for node-link graph visualizations.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"12"},"PeriodicalIF":3.2,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12061801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144040146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence in retinal image analysis for hypertensive retinopathy diagnosis: a comprehensive review and perspective. 人工智能在视网膜图像分析诊断高血压视网膜病变中的应用综述与展望。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-05-01 DOI: 10.1186/s42492-025-00194-x
Rajendra Kankrale, Manesh Kokare
{"title":"Artificial intelligence in retinal image analysis for hypertensive retinopathy diagnosis: a comprehensive review and perspective.","authors":"Rajendra Kankrale, Manesh Kokare","doi":"10.1186/s42492-025-00194-x","DOIUrl":"https://doi.org/10.1186/s42492-025-00194-x","url":null,"abstract":"<p><p>Hypertensive retinopathy (HR) occurs when the choroidal vessels, which form the photosensitive layer at the back of the eye, are injured owing to high blood pressure. Artificial intelligence (AI) in retinal image analysis (RIA) for HR diagnosis involves the use of advanced computational algorithms and machine learning (ML) strategies to recognize and evaluate signs of HR in retinal images automatically. This review aims to advance the field of HR diagnosis by investigating the latest ML and deep learning techniques, and highlighting their efficacy and capability for early diagnosis and intervention. By analyzing recent advancements and emerging trends, this study seeks to inspire further innovation in automated RIA. In this context, AI shows significant potential for enhancing the accuracy, effectiveness, and consistency of HR diagnoses. This will eventually lead to better clinical results by enabling earlier intervention and precise management of the condition. Overall, the integration of AI into RIA represents a considerable step forward in the early identification and treatment of HR, offering substantial benefits to both healthcare providers and patients.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"11"},"PeriodicalIF":3.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12044089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144041827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion. lvit.net:结合局部语义和多特征交叉融合的领域泛化人物再识别模型。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-04-16 DOI: 10.1186/s42492-025-00190-1
Xintong Hu, Peishun Liu, Xuefang Wang, Peiyao Wu, Ruichun Tang
{"title":"LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion.","authors":"Xintong Hu, Peishun Liu, Xuefang Wang, Peiyao Wu, Ruichun Tang","doi":"10.1186/s42492-025-00190-1","DOIUrl":"https://doi.org/10.1186/s42492-025-00190-1","url":null,"abstract":"<p><p>In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spatial information, resulting in weaker generalization to unseen domains. To address this issue, an innovative domain generalization person ReID method-LViT-Net, which combines local semantics and multi-feature cross fusion, is proposed. LViT-Net adopts a dual-branch encoder with a parallel hierarchical structure to extract both local and global discriminative features. In the local branch, the local multi-scale feature fusion module is designed to fuse local feature units at different scales to ensure that the fine-grained local features at various levels are accurately captured, thereby enhancing the robustness of the features. In the global branch, the dual feature cross fusion module fuses local features and global semantic information, focusing on critical semantic information and enabling the mutual refinement and matching of local and global features. This allows the model to achieve a dynamic balance between detailed and holistic information, forming robust feature representations of pedestrians. Extensive experiments demonstrate the effectiveness of LViT-Net. In both single-source and multi-source comparison experiments, the proposed method outperforms existing state-of-the-art methods.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"10"},"PeriodicalIF":3.2,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003221/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation. 基于图形的视觉问答和场景图形策展的视觉可解释人工智能。
IF 3.2 4区 计算机科学
Visual Computing for Industry Biomedicine and Art Pub Date : 2025-04-07 DOI: 10.1186/s42492-025-00185-y
Sebastian Künzel, Tanja Munz-Körner, Pascal Tilli, Noel Schäfer, Sandeep Vidyapu, Ngoc Thang Vu, Daniel Weiskopf
{"title":"Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation.","authors":"Sebastian Künzel, Tanja Munz-Körner, Pascal Tilli, Noel Schäfer, Sandeep Vidyapu, Ngoc Thang Vu, Daniel Weiskopf","doi":"10.1186/s42492-025-00185-y","DOIUrl":"10.1186/s42492-025-00185-y","url":null,"abstract":"<p><p>This study presents a novel visualization approach to explainable artificial intelligence for graph-based visual question answering (VQA) systems. The method focuses on identifying false answer predictions by the model and offers users the opportunity to directly correct mistakes in the input space, thus facilitating dataset curation. The decision-making process of the model is demonstrated by highlighting certain internal states of a graph neural network (GNN). The proposed system is built on top of a GraphVQA framework that implements various GNN-based models for VQA trained on the GQA dataset. The authors evaluated their tool through the demonstration of identified use cases, quantitative measures, and a user study conducted with experts from machine learning, visualization, and natural language processing domains. The authors' findings highlight the prominence of their implemented features in supporting the users with incorrect prediction identification and identifying the underlying issues. Additionally, their approach is easily extendable to similar models aiming at graph-based question answering.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"9"},"PeriodicalIF":3.2,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11977082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信