Visual Computing for Industry Biomedicine and Art最新文献_第4页

Revolutionizing anemia detection: integrative machine learning models and advanced attention mechanisms. 彻底改变贫血检测：综合机器学习模型和先进的注意力机制。

IF 3.2 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-07-17 DOI: 10.1186/s42492-024-00169-4

Muhammad Ramzan, Jinfang Sheng, Muhammad Usman Saeed, Bin Wang, Faisal Z Duraihem

{"title":"Revolutionizing anemia detection: integrative machine learning models and advanced attention mechanisms.","authors":"Muhammad Ramzan, Jinfang Sheng, Muhammad Usman Saeed, Bin Wang, Faisal Z Duraihem","doi":"10.1186/s42492-024-00169-4","DOIUrl":"10.1186/s42492-024-00169-4","url":null,"abstract":"This study addresses the critical issue of anemia detection using machine learning (ML) techniques. Although a widespread blood disorder with significant health implications, anemia often remains undetected. This necessitates timely and efficient diagnostic methods, as traditional approaches that rely on manual assessment are time-consuming and subjective. The present study explored the application of ML - particularly classification models, such as logistic regression, decision trees, random forest, support vector machines, Naïve Bayes, and k-nearest neighbors - in conjunction with innovative models incorporating attention modules and spatial attention to detect anemia. The proposed models demonstrated promising results, achieving high accuracy, precision, recall, and F1 scores for both textual and image datasets. In addition, an integrated approach that combines textual and image data was found to outperform the individual modalities. Specifically, the proposed AlexNet Multiple Spatial Attention model achieved an exceptional accuracy of 99.58%, emphasizing its potential to revolutionize automated anemia detection. The results of ablation studies confirm the significance of key components - including the blue-green-red, multiple, and spatial attentions - in enhancing model performance. Overall, this study presents a comprehensive and innovative framework for noninvasive anemia detection, contributing valuable insights to the field.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"18"},"PeriodicalIF":3.2,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11255163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141627889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification. 用于肺炎分类的全维动态卷积特征坐标注意网络

IF 3.2 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-07-08 DOI: 10.1186/s42492-024-00168-5

Yufei Li, Yufei Xin, Xinni Li, Yinrui Zhang, Cheng Liu, Zhengwen Cao, Shaoyi Du, Lin Wang

{"title":"Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification.","authors":"Yufei Li, Yufei Xin, Xinni Li, Yinrui Zhang, Cheng Liu, Zhengwen Cao, Shaoyi Du, Lin Wang","doi":"10.1186/s42492-024-00168-5","DOIUrl":"10.1186/s42492-024-00168-5","url":null,"abstract":"Pneumonia is a serious disease that can be fatal, particularly among children and the elderly. The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging. This study proposes X-ODFCANet, which addresses the issues of low accuracy and excessive parameters in existing deep-learning-based pneumonia-classification methods. This network incorporates a feature coordination attention module and an omni-dimensional dynamic convolution (ODConv) module, leveraging the residual module for feature extraction from X-ray images. The feature coordination attention module utilizes two one-dimensional feature encoding processes to aggregate feature information from different spatial directions. Additionally, the ODConv module extracts and fuses feature information in four dimensions: the spatial dimension of the convolution kernel, input and output channel quantities, and convolution kernel quantity. The experimental results demonstrate that the proposed method can effectively improve the accuracy of pneumonia classification, which is 3.77% higher than that of ResNet18. The model parameters are 4.45M, which was reduced by approximately 2.5 times. The code is available at https://github.com/limuni/X-ODFCANET .","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"17"},"PeriodicalIF":3.2,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11231110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-invasively identifying candidates of active surveillance for prostate cancer using magnetic resonance imaging radiomics. 利用磁共振成像放射组学，无创识别前列腺癌主动监测的候选者。

IF 3.2 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-07-05 DOI: 10.1186/s42492-024-00167-6

Yuwei Liu, Litao Zhao, Jie Bao, Jian Hou, Zhaozhao Jing, Songlu Liu, Xuanhao Li, Zibing Cao, Boyu Yang, Junkang Shen, Ji Zhang, Libiao Ji, Zhen Kang, Chunhong Hu, Liang Wang, Jiangang Liu

{"title":"Non-invasively identifying candidates of active surveillance for prostate cancer using magnetic resonance imaging radiomics.","authors":"Yuwei Liu, Litao Zhao, Jie Bao, Jian Hou, Zhaozhao Jing, Songlu Liu, Xuanhao Li, Zibing Cao, Boyu Yang, Junkang Shen, Ji Zhang, Libiao Ji, Zhen Kang, Chunhong Hu, Liang Wang, Jiangang Liu","doi":"10.1186/s42492-024-00167-6","DOIUrl":"10.1186/s42492-024-00167-6","url":null,"abstract":"Active surveillance (AS) is the primary strategy for managing patients with low or favorable-intermediate risk prostate cancer (PCa). Identifying patients who may benefit from AS relies on unpleasant prostate biopsies, which entail the risk of bleeding and infection. In the current study, we aimed to develop a radiomics model based on prostate magnetic resonance images to identify AS candidates non-invasively. A total of 956 PCa patients with complete biopsy reports from six hospitals were included in the current multicenter retrospective study. The National Comprehensive Cancer Network (NCCN) guidelines were used as reference standards to determine the AS candidacy. To discriminate between AS and non-AS candidates, five radiomics models (i.e., eXtreme Gradient Boosting (XGBoost) AS classifier (XGB-AS), logistic regression (LR) AS classifier, random forest (RF) AS classifier, adaptive boosting (AdaBoost) AS classifier, and decision tree (DT) AS classifier) were developed and externally validated using a three-fold cross-center validation based on five classifiers: XGBoost, LR, RF, AdaBoost, and DT. Area under the receiver operating characteristic curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE) were calculated to evaluate the performance of these models. XGB-AS exhibited an average of AUC of 0.803, ACC of 0.693, SEN of 0.668, and SPE of 0.841, showing a better comprehensive performance than those of the other included radiomic models. Additionally, the XGB-AS model also presented a promising performance for identifying AS candidates from the intermediate-risk cases and the ambiguous cases with diagnostic discordance between the NCCN guidelines and the Prostate Imaging-Reporting and Data System assessment. These results suggest that the XGB-AS model has the potential to help identify patients who are suitable for AS and allow non-invasive monitoring of patients on AS, thereby reducing the number of annual biopsies and the associated risks of bleeding and infection.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"16"},"PeriodicalIF":3.2,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11226574/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141535544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two-step hierarchical binary classification of cancerous skin lesions using transfer learning and the random forest algorithm. 利用迁移学习和随机森林算法对癌症皮肤病变进行两步分层二元分类。

IF 2.8 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-06-17 DOI: 10.1186/s42492-024-00166-7

Taofik Ahmed Suleiman, Daniel Tweneboah Anyimadu, Andrew Dwi Permana, Hsham Abdalgny Abdalwhab Ngim, Alessandra Scotto di Freca

{"title":"Two-step hierarchical binary classification of cancerous skin lesions using transfer learning and the random forest algorithm.","authors":"Taofik Ahmed Suleiman, Daniel Tweneboah Anyimadu, Andrew Dwi Permana, Hsham Abdalgny Abdalwhab Ngim, Alessandra Scotto di Freca","doi":"10.1186/s42492-024-00166-7","DOIUrl":"10.1186/s42492-024-00166-7","url":null,"abstract":"Skin lesion classification plays a crucial role in the early detection and diagnosis of various skin conditions. Recent advances in computer-aided diagnostic techniques have been instrumental in timely intervention, thereby improving patient outcomes, particularly in rural communities lacking specialized expertise. Despite the widespread adoption of convolutional neural networks (CNNs) in skin disease detection, their effectiveness has been hindered by the limited size and data imbalance of publicly accessible skin lesion datasets. In this context, a two-step hierarchical binary classification approach is proposed utilizing hybrid machine and deep learning (DL) techniques. Experiments conducted on the International Skin Imaging Collaboration (ISIC 2017) dataset demonstrate the effectiveness of the hierarchical approach in handling large class imbalances. Specifically, employing DenseNet121 (DNET) as a feature extractor and random forest (RF) as a classifier yielded the most promising results, achieving a balanced multiclass accuracy (BMA) of 91.07% compared to the pure deep-learning model (end-to-end DNET) with a BMA of 88.66%. The RF ensemble exhibited significantly greater efficiency than other machine-learning classifiers in aiding DL to address the challenge of learning with limited data. Furthermore, the implemented predictive hybrid hierarchical model demonstrated enhanced performance while significantly reducing computational time, indicating its potential efficiency in real-world applications for the classification of skin lesions.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"15"},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11183002/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141331925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel processing model for low-dose computed tomography image denoising. 低剂量计算机断层扫描图像去噪的并行处理模型

IF 2.8 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-06-12 DOI: 10.1186/s42492-024-00165-8

Libing Yao, Jiping Wang, Zhongyi Wu, Qiang Du, Xiaodong Yang, Ming Li, Jian Zheng

{"title":"Parallel processing model for low-dose computed tomography image denoising.","authors":"Libing Yao, Jiping Wang, Zhongyi Wu, Qiang Du, Xiaodong Yang, Ming Li, Jian Zheng","doi":"10.1186/s42492-024-00165-8","DOIUrl":"10.1186/s42492-024-00165-8","url":null,"abstract":"Low-dose computed tomography (LDCT) has gained increasing attention owing to its crucial role in reducing radiation exposure in patients. However, LDCT-reconstructed images often suffer from significant noise and artifacts, negatively impacting the radiologists' ability to accurately diagnose. To address this issue, many studies have focused on denoising LDCT images using deep learning (DL) methods. However, these DL-based denoising methods have been hindered by the highly variable feature distribution of LDCT data from different imaging sources, which adversely affects the performance of current denoising models. In this study, we propose a parallel processing model, the multi-encoder deep feature transformation network (MDFTN), which is designed to enhance the performance of LDCT imaging for multisource data. Unlike traditional network structures, which rely on continual learning to process multitask data, the approach can simultaneously handle LDCT images within a unified framework from various imaging sources. The proposed MDFTN consists of multiple encoders and decoders along with a deep feature transformation module (DFTM). During forward propagation in network training, each encoder extracts diverse features from its respective data source in parallel and the DFTM compresses these features into a shared feature space. Subsequently, each decoder performs an inverse operation for multisource loss estimation. Through collaborative training, the proposed MDFTN leverages the complementary advantages of multisource data distribution to enhance its adaptability and generalization. Numerous experiments were conducted on two public datasets and one local dataset, which demonstrated that the proposed network model can simultaneously process multisource data while effectively suppressing noise and preserving fine structures. The source code is available at https://github.com/123456789ey/MDFTN .","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"14"},"PeriodicalIF":2.8,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169366/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simulated deep CT characterization of liver metastases with high-resolution filtered back projection reconstruction. 利用高分辨率滤波背投影重建模拟肝转移灶的深度 CT 特征。

IF 3.2 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-06-11 DOI: 10.1186/s42492-024-00161-y

Christopher Wiedeman, Peter Lorraine, Ge Wang, Richard Do, Amber Simpson, Jacob Peoples, Bruno De Man

{"title":"Simulated deep CT characterization of liver metastases with high-resolution filtered back projection reconstruction.","authors":"Christopher Wiedeman, Peter Lorraine, Ge Wang, Richard Do, Amber Simpson, Jacob Peoples, Bruno De Man","doi":"10.1186/s42492-024-00161-y","DOIUrl":"10.1186/s42492-024-00161-y","url":null,"abstract":"Early diagnosis and accurate prognosis of colorectal cancer is critical for determining optimal treatment plans and maximizing patient outcomes, especially as the disease progresses into liver metastases. Computed tomography (CT) is a frontline tool for this task; however, the preservation of predictive radiomic features is highly dependent on the scanning protocol and reconstruction algorithm. We hypothesized that image reconstruction with a high-frequency kernel could result in a better characterization of liver metastases features via deep neural networks. This kernel produces images that appear noisier but preserve more sinogram information. A simulation pipeline was developed to study the effects of imaging parameters on the ability to characterize the features of liver metastases. This pipeline utilizes a fractal approach to generate a diverse population of shapes representing virtual metastases, and then it superimposes them on a realistic CT liver region to perform a virtual CT scan using CatSim. Datasets of 10,000 liver metastases were generated, scanned, and reconstructed using either standard or high-frequency kernels. These data were used to train and validate deep neural networks to recover crafted metastases characteristics, such as internal heterogeneity, edge sharpness, and edge fractal dimension. In the absence of noise, models scored, on average, 12.2% ( <math><mrow><mi>α</mi> <mo>=</mo> <mn>0.012</mn></mrow> </math> ) and 7.5% ( <math><mrow><mi>α</mi> <mo>=</mo> <mn>0.049</mn> <mo>)</mo></mrow> </math> lower squared error for characterizing edge sharpness and fractal dimension, respectively, when using high-frequency reconstructions compared to standard. However, the differences in performance were statistically insignificant when a typical level of CT noise was simulated in the clinical scan. Our results suggest that high-frequency reconstruction kernels can better preserve information for downstream artificial intelligence-based radiomic characterization, provided that noise is limited. Future work should investigate the information-preserving kernels in datasets with clinical labels.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"13"},"PeriodicalIF":3.2,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11166620/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141301767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Schlieren imaging and video classification of alphabet pronunciations: exploiting phonetic flows for speech recognition and speech therapy. 字母发音的 Schlieren 成像和视频分类：利用语音流进行语音识别和语音治疗。

IF 2.8 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-05-22 DOI: 10.1186/s42492-024-00163-w

Mohamed Talaat, Kian Barari, Xiuhua April Si, Jinxiang Xi

{"title":"Schlieren imaging and video classification of alphabet pronunciations: exploiting phonetic flows for speech recognition and speech therapy.","authors":"Mohamed Talaat, Kian Barari, Xiuhua April Si, Jinxiang Xi","doi":"10.1186/s42492-024-00163-w","DOIUrl":"10.1186/s42492-024-00163-w","url":null,"abstract":"Speech is a highly coordinated process that requires precise control over vocal tract morphology/motion to produce intelligible sounds while simultaneously generating unique exhaled flow patterns. The schlieren imaging technique visualizes airflows with subtle density variations. It is hypothesized that speech flows captured by schlieren, when analyzed using a hybrid of convolutional neural network (CNN) and long short-term memory (LSTM) network, can recognize alphabet pronunciations, thus facilitating automatic speech recognition and speech disorder therapy. This study evaluates the feasibility of using a CNN-based video classification network to differentiate speech flows corresponding to the first four alphabets: /A/, /B/, /C/, and /D/. A schlieren optical system was developed, and the speech flows of alphabet pronunciations were recorded for two participants at an acquisition rate of 60 frames per second. A total of 640 video clips, each lasting 1 s, were utilized to train and test a hybrid CNN-LSTM network. Acoustic analyses of the recorded sounds were conducted to understand the phonetic differences among the four alphabets. The hybrid CNN-LSTM network was trained separately on four datasets of varying sizes (i.e., 20, 30, 40, 50 videos per alphabet), all achieving over 95% accuracy in classifying videos of the same participant. However, the network's performance declined when tested on speech flows from a different participant, with accuracy dropping to around 44%, indicating significant inter-participant variability in alphabet pronunciation. Retraining the network with videos from both participants improved accuracy to 93% on the second participant. Analysis of misclassified videos indicated that factors such as low video quality and disproportional head size affected accuracy. These results highlight the potential of CNN-assisted speech recognition and speech therapy using articulation flows, although challenges remain in expanding the alphabet set and participant cohort.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"12"},"PeriodicalIF":2.8,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11109036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141075115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

V4RIN: visual analysis of regional industry network with domain knowledge. V4RIN：利用领域知识对区域产业网络进行可视化分析。

IF 2.8 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-05-15 DOI: 10.1186/s42492-024-00164-9

Wenli Xiong, Chenjie Yu, Chen Shi, Yaxuan Zheng, Xiping Wang, Yanpeng Hu, Hong Yin, Chenhui Li, Changbo Wang

{"title":"V4RIN: visual analysis of regional industry network with domain knowledge.","authors":"Wenli Xiong, Chenjie Yu, Chen Shi, Yaxuan Zheng, Xiping Wang, Yanpeng Hu, Hong Yin, Chenhui Li, Changbo Wang","doi":"10.1186/s42492-024-00164-9","DOIUrl":"10.1186/s42492-024-00164-9","url":null,"abstract":"The regional industry network (RIN) is a type of financial network derived from industry networks that possess the capability to describe the connections between specific industries within a particular region. For most investors and financial analysts lacking extensive experience, the decision-support information provided by industry networks may be too vague. Conversely, RINs express more detailed and specific industry connections both within and outside the region. As RIN analysis is domain-specific and current financial network analysis tools are designed for generalized analytical tasks and cannot be directly applied to RINs, new visual analysis approaches are needed to enhance information exploration efficiency. In this study, we collaborated with domain experts and proposed V4RIN, an interactive visualization analysis system that integrates predefined domain knowledge and data processing methods to support users in uploading custom data. Through multiple views in the system panel, users can comprehensively explore the structure, geographical distribution, and spatiotemporal variations of the RIN. Two case studies were conducted and a set of expert interviews with five domain experts to validate the usability and reliability of our system.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"11"},"PeriodicalIF":2.8,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11096142/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140923529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Typicality- and instance-dependent label noise-combating: a novel framework for simulating and combating real-world noisy labels for endoscopic polyp classification. 典型性和实例依赖性标签降噪：为内窥镜息肉分类模拟和消除真实世界噪声标签的新型框架。

IF 2.8 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-05-06 DOI: 10.1186/s42492-024-00162-x

Yun Gao, Junhu Fu, Yuanyuan Wang, Yi Guo

{"title":"Typicality- and instance-dependent label noise-combating: a novel framework for simulating and combating real-world noisy labels for endoscopic polyp classification.","authors":"Yun Gao, Junhu Fu, Yuanyuan Wang, Yi Guo","doi":"10.1186/s42492-024-00162-x","DOIUrl":"10.1186/s42492-024-00162-x","url":null,"abstract":"Learning with noisy labels aims to train neural networks with noisy labels. Current models handle instance-independent label noise (IIN) well; however, they fall short with real-world noise. In medical image classification, atypical samples frequently receive incorrect labels, rendering instance-dependent label noise (IDN) an accurate representation of real-world scenarios. However, the current IDN approaches fail to consider the typicality of samples, which hampers their ability to address real-world label noise effectively. To alleviate the issues, we introduce typicality- and instance-dependent label noise (TIDN) to simulate real-world noise and establish a TIDN-combating framework to combat label noise. Specifically, we use the sample's distance to decision boundaries in the feature space to represent typicality. The TIDN is then generated according to typicality. We establish a TIDN-attention module to combat label noise and learn the transition matrix from latent ground truth to the observed noisy labels. A recursive algorithm that enables the network to make correct predictions with corrections from the learned transition matrix is proposed. Our experiments demonstrate that the TIDN simulates real-world noise more closely than the existing IIN and IDN. Furthermore, the TIDN-combating framework demonstrates superior classification performance when training with simulated TIDN and actual real-world noise.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"10"},"PeriodicalIF":2.8,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11074096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140870083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual modality prompt learning for visual question-grounded answering in robotic surgery 机器人手术中视觉问题解答的双模式提示学习

IF 2.8 4区计算机科学

Visual Computing for Industry Biomedicine and Art Pub Date : 2024-04-22 DOI: 10.1186/s42492-024-00160-z

Yue Zhang, Wanshu Fan, Peixi Peng, Xin Yang, Dongsheng Zhou, Xiaopeng Wei

{"title":"Dual modality prompt learning for visual question-grounded answering in robotic surgery","authors":"Yue Zhang, Wanshu Fan, Peixi Peng, Xin Yang, Dongsheng Zhou, Xiaopeng Wei","doi":"10.1186/s42492-024-00160-z","DOIUrl":"https://doi.org/10.1186/s42492-024-00160-z","url":null,"abstract":"With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image. This limitation restricts the interpretative capacity of the VQA models and their ability to explore specific image regions. To address this issue, this study proposes a grounded VQA model for robotic surgery, capable of localizing a specific region during answer prediction. Drawing inspiration from prompt learning in language models, a dual-modality prompt model was developed to enhance precise multimodal information interactions. Specifically, two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model. A visual complementary prompter merges visual prompt knowledge with visual information features to guide accurate localization. The textual complementary prompter aligns visual information with textual prompt knowledge and textual information, guiding textual information towards a more accurate inference of the answer. Additionally, a multiple iterative fusion strategy was adopted for comprehensive answer reasoning, to ensure high-quality generation of textual and grounded answers. The experimental results validate the effectiveness of the model, demonstrating its superiority over existing methods on the EndoVis-18 and EndoVis-17 datasets.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"40 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0