{"title":"Dual modality prompt learning for visual question-grounded answering in robotic surgery","authors":"Yue Zhang, Wanshu Fan, Peixi Peng, Xin Yang, Dongsheng Zhou, Xiaopeng Wei","doi":"10.1186/s42492-024-00160-z","DOIUrl":"https://doi.org/10.1186/s42492-024-00160-z","url":null,"abstract":"With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image. This limitation restricts the interpretative capacity of the VQA models and their ability to explore specific image regions. To address this issue, this study proposes a grounded VQA model for robotic surgery, capable of localizing a specific region during answer prediction. Drawing inspiration from prompt learning in language models, a dual-modality prompt model was developed to enhance precise multimodal information interactions. Specifically, two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model. A visual complementary prompter merges visual prompt knowledge with visual information features to guide accurate localization. The textual complementary prompter aligns visual information with textual prompt knowledge and textual information, guiding textual information towards a more accurate inference of the answer. Additionally, a multiple iterative fusion strategy was adopted for comprehensive answer reasoning, to ensure high-quality generation of textual and grounded answers. The experimental results validate the effectiveness of the model, demonstrating its superiority over existing methods on the EndoVis-18 and EndoVis-17 datasets.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"40 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated analysis of pectoralis major thickness in pec-fly exercises: evolving from manual measurement to deep learning techniques","authors":"Shangyu Cai, Yongsheng Lin, Haoxin Chen, Zihao Huang, Yongjin Zhou, Yongping Zheng","doi":"10.1186/s42492-024-00159-6","DOIUrl":"https://doi.org/10.1186/s42492-024-00159-6","url":null,"abstract":"This study addresses a limitation of prior research on pectoralis major (PMaj) thickness changes during the pectoralis fly exercise using a wearable ultrasound imaging setup. Although previous studies used manual measurement and subjective evaluation, it is important to acknowledge the subsequent limitations of automating widespread applications. We then employed a deep learning model for image segmentation and automated measurement to solve the problem and study the additional quantitative supplementary information that could be provided. Our results revealed increased PMaj thickness changes in the coronal plane within the probe detection region when real-time ultrasound imaging (RUSI) visual biofeedback was incorporated, regardless of load intensity (50% or 80% of one-repetition maximum). Additionally, participants showed uniform thickness changes in the PMaj in response to enhanced RUSI biofeedback. Notably, the differences in PMaj thickness changes between load intensities were reduced by RUSI biofeedback, suggesting altered muscle activation strategies. We identified the optimal measurement location for the maximal PMaj thickness close to the rib end and emphasized the lightweight applicability of our model for fitness training and muscle assessment. Further studies can refine load intensities, investigate diverse parameters, and employ different network models to enhance accuracy. This study contributes to our understanding of the effects of muscle physiology and exercise training.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"49 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140586928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three-dimensional reconstruction of industrial parts from a single image.","authors":"Zhenxing Xu, Aizeng Wang, Fei Hou, Gang Zhao","doi":"10.1186/s42492-024-00158-7","DOIUrl":"10.1186/s42492-024-00158-7","url":null,"abstract":"<p><p>This study proposes an image-based three-dimensional (3D) vector reconstruction of industrial parts that can generate non-uniform rational B-splines (NURBS) surfaces with high fidelity and flexibility. The contributions of this study include three parts: first, a dataset of two-dimensional images is constructed for typical industrial parts, including hexagonal head bolts, cylindrical gears, shoulder rings, hexagonal nuts, and cylindrical roller bearings; second, a deep learning algorithm is developed for parameter extraction of 3D industrial parts, which can determine the final 3D parameters and pose information of the reconstructed model using two new nets, CAD-ClassNet and CAD-ReconNet; and finally, a 3D vector shape reconstruction of mechanical parts is presented to generate NURBS from the obtained shape parameters. The final reconstructed models show that the proposed approach is highly accurate, efficient, and practical.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"7"},"PeriodicalIF":3.2,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11329437/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140294782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linyuan Wang, Xiaofeng Zhang, Congyu Tian, Shu Chen, Yongzhi Deng, Xiangyun Liao, Qiong Wang, Weixin Si
{"title":"PlaqueNet: deep learning enabled coronary artery plaque segmentation from coronary computed tomography angiography.","authors":"Linyuan Wang, Xiaofeng Zhang, Congyu Tian, Shu Chen, Yongzhi Deng, Xiangyun Liao, Qiong Wang, Weixin Si","doi":"10.1186/s42492-024-00157-8","DOIUrl":"10.1186/s42492-024-00157-8","url":null,"abstract":"<p><p>Cardiovascular disease, primarily caused by atherosclerotic plaque formation, is a significant health concern. The early detection of these plaques is crucial for targeted therapies and reducing the risk of cardiovascular diseases. This study presents PlaqueNet, a solution for segmenting coronary artery plaques from coronary computed tomography angiography (CCTA) images. For feature extraction, the advanced residual net module was utilized, which integrates a deepwise residual optimization module into network branches, enhances feature extraction capabilities, avoiding information loss, and addresses gradient issues during training. To improve segmentation accuracy, a depthwise atrous spatial pyramid pooling based on bicubic efficient channel attention (DASPP-BICECA) module is introduced. The BICECA component amplifies the local feature sensitivity, whereas the DASPP component expands the network's information-gathering scope, resulting in elevated segmentation accuracy. Additionally, BINet, a module for joint network loss evaluation, is proposed. It optimizes the segmentation model without affecting the segmentation results. When combined with the DASPP-BICECA module, BINet enhances overall efficiency. The CCTA segmentation algorithm proposed in this study outperformed the other three comparative algorithms, achieving an intersection over Union of 87.37%, Dice of 93.26%, accuracy of 93.12%, mean intersection over Union of 93.68%, mean Dice of 96.63%, and mean pixel accuracy value of 96.55%.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"6"},"PeriodicalIF":3.2,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11349722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140185849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flipover outperforms dropout in deep learning","authors":"Yuxuan Liang, Chuang Niu, Pingkun Yan, Ge Wang","doi":"10.1186/s42492-024-00153-y","DOIUrl":"https://doi.org/10.1186/s42492-024-00153-y","url":null,"abstract":"Flipover, an enhanced dropout technique, is introduced to improve the robustness of artificial neural networks. In contrast to dropout, which involves randomly removing certain neurons and their connections, flipover randomly selects neurons and reverts their outputs using a negative multiplier during training. This approach offers stronger regularization than conventional dropout, refining model performance by (1) mitigating overfitting, matching or even exceeding the efficacy of dropout; (2) amplifying robustness to noise; and (3) enhancing resilience against adversarial attacks. Extensive experiments across various neural networks affirm the effectiveness of flipover in deep learning.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"139 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaouad Tagnamas, Hiba Ramadan, Ali Yahyaouy, Hamid Tairi
{"title":"Correction: Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images.","authors":"Jaouad Tagnamas, Hiba Ramadan, Ali Yahyaouy, Hamid Tairi","doi":"10.1186/s42492-024-00156-9","DOIUrl":"10.1186/s42492-024-00156-9","url":null,"abstract":"","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"5"},"PeriodicalIF":2.8,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10858012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sazia Parvin, Sonia Farhana Nimmy, Md Sarwar Kamal
{"title":"Convolutional neural network based data interpretable framework for Alzheimer's treatment planning.","authors":"Sazia Parvin, Sonia Farhana Nimmy, Md Sarwar Kamal","doi":"10.1186/s42492-024-00154-x","DOIUrl":"10.1186/s42492-024-00154-x","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a neurological disorder that predominantly affects the brain. In the coming years, it is expected to spread rapidly, with limited progress in diagnostic techniques. Various machine learning (ML) and artificial intelligence (AI) algorithms have been employed to detect AD using single-modality data. However, recent developments in ML have enabled the application of these methods to multiple data sources and input modalities for AD prediction. In this study, we developed a framework that utilizes multimodal data (tabular data, magnetic resonance imaging (MRI) images, and genetic information) to classify AD. As part of the pre-processing phase, we generated a knowledge graph from the tabular data and MRI images. We employed graph neural networks for knowledge graph creation, and region-based convolutional neural network approach for image-to-knowledge graph generation. Additionally, we integrated various explainable AI (XAI) techniques to interpret and elucidate the prediction outcomes derived from multimodal data. Layer-wise relevance propagation was used to explain the layer-wise outcomes in the MRI images. We also incorporated submodular pick local interpretable model-agnostic explanations to interpret the decision-making process based on the tabular data provided. Genetic expression values play a crucial role in AD analysis. We used a graphical gene tree to identify genes associated with the disease. Moreover, a dashboard was designed to display XAI outcomes, enabling experts and medical professionals to easily comprehend the prediction results.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"3"},"PeriodicalIF":3.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10830981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139651820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaouad Tagnamas, Hiba Ramadan, Ali Yahyaouy, Hamid Tairi
{"title":"Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images.","authors":"Jaouad Tagnamas, Hiba Ramadan, Ali Yahyaouy, Hamid Tairi","doi":"10.1186/s42492-024-00155-w","DOIUrl":"10.1186/s42492-024-00155-w","url":null,"abstract":"<p><p>Accurate segmentation of breast ultrasound (BUS) images is crucial for early diagnosis and treatment of breast cancer. Further, the task of segmenting lesions in BUS images continues to pose significant challenges due to the limitations of convolutional neural networks (CNNs) in capturing long-range dependencies and obtaining global context information. Existing methods relying solely on CNNs have struggled to address these issues. Recently, ConvNeXts have emerged as a promising architecture for CNNs, while transformers have demonstrated outstanding performance in diverse computer vision tasks, including the analysis of medical images. In this paper, we propose a novel breast lesion segmentation network CS-Net that combines the strengths of ConvNeXt and Swin Transformer models to enhance the performance of the U-Net architecture. Our network operates on BUS images and adopts an end-to-end approach to perform segmentation. To address the limitations of CNNs, we design a hybrid encoder that incorporates modified ConvNeXt convolutions and Swin Transformer. Furthermore, to enhance capturing the spatial and channel attention in feature maps we incorporate the Coordinate Attention Module. Second, we design an Encoder-Decoder Features Fusion Module that facilitates the fusion of low-level features from the encoder with high-level semantic features from the decoder during the image reconstruction. Experimental results demonstrate the superiority of our network over state-of-the-art image segmentation methods for BUS lesions segmentation.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"2"},"PeriodicalIF":3.2,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10811315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139564831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jia Fu, Mengjie Fang, Zhiyong Lin, Jianxing Qiu, Min Yang, Jie Tian, Di Dong, Yinghua Zou
{"title":"CT-based radiomics: predicting early outcomes after percutaneous transluminal renal angioplasty in patients with severe atherosclerotic renal artery stenosis.","authors":"Jia Fu, Mengjie Fang, Zhiyong Lin, Jianxing Qiu, Min Yang, Jie Tian, Di Dong, Yinghua Zou","doi":"10.1186/s42492-023-00152-5","DOIUrl":"10.1186/s42492-023-00152-5","url":null,"abstract":"<p><p>This study aimed to comprehensively evaluate non-contrast computed tomography (CT)-based radiomics for predicting early outcomes in patients with severe atherosclerotic renal artery stenosis (ARAS) after percutaneous transluminal renal angioplasty (PTRA). A total of 52 patients were retrospectively recruited, and their clinical characteristics and pretreatment CT images were collected. During a median follow-up period of 3.7 mo, 18 patients were confirmed to have benefited from the treatment, defined as a 20% improvement from baseline in the estimated glomerular filtration rate. A deep learning network trained via self-supervised learning was used to enhance the imaging phenotype characteristics. Radiomics features, comprising 116 handcrafted features and 78 deep learning features, were extracted from the affected renal and perirenal adipose regions. More features from the latter were correlated with early outcomes, as determined by univariate analysis, and were visually represented in radiomics heatmaps and volcano plots. After using consensus clustering and the least absolute shrinkage and selection operator method for feature selection, five machine learning models were evaluated. Logistic regression yielded the highest leave-one-out cross-validation accuracy of 0.780 (95%CI: 0.660-0.880) for the renal signature, while the support vector machine achieved 0.865 (95%CI: 0.769-0.942) for the perirenal adipose signature. SHapley Additive exPlanations was used to visually interpret the prediction mechanism, and a histogram feature and a deep learning feature were identified as the most influential factors for the renal signature and perirenal adipose signature, respectively. Multivariate analysis revealed that both signatures served as independent predictive factors. When combined, they achieved an area under the receiver operating characteristic curve of 0.888 (95%CI: 0.784-0.992), indicating that the imaging phenotypes from both regions complemented each other. In conclusion, non-contrast CT-based radiomics can be leveraged to predict the early outcomes of PTRA, thereby assisting in identifying patients with ARAS suitable for this treatment, with perirenal adipose tissue providing added predictive value.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"1"},"PeriodicalIF":2.8,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10784441/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139425625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive feature extraction method for capsule endoscopy images","authors":"Dingchang Wu, Yinghui Wang, Haomiao Ma, Lingyu Ai, Jinlong Yang, Shaojie Zhang, Wei Li","doi":"10.1186/s42492-023-00151-6","DOIUrl":"https://doi.org/10.1186/s42492-023-00151-6","url":null,"abstract":"The traditional feature-extraction method of oriented FAST and rotated BRIEF (ORB) detects image features based on a fixed threshold; however, ORB descriptors do not distinguish features well in capsule endoscopy images. Therefore, a new feature detector that uses a new method for setting thresholds, called the adaptive threshold FAST and FREAK in capsule endoscopy images (AFFCEI), is proposed. This method, first constructs an image pyramid and then calculates the thresholds of pixels based on the gray value contrast of all pixels in the local neighborhood of the image, to achieve adaptive image feature extraction in each layer of the pyramid. Subsequently, the features are expressed by the FREAK descriptor, which can enhance the discrimination of the features extracted from the stomach image. Finally, a refined matching is obtained by applying the grid-based motion statistics algorithm to the result of Hamming distance, whereby mismatches are rejected using the RANSAC algorithm. Compared with the ASIFT method, which previously had the best performance, the average running time of AFFCEI was 4/5 that of ASIFT, and the average matching score improved by 5% when tracking features in a moving capsule endoscope.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"27 1-2","pages":""},"PeriodicalIF":2.8,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138566135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}