Image and Vision Computing最新文献

筛选
英文 中文
DFEDC: Dual fusion with enhanced deformable convolution for medical image segmentation DFEDC:利用增强型可变形卷积进行医学图像分割的双重融合
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-13 DOI: 10.1016/j.imavis.2024.105277
Xian Fang, Yueqian Pan, Qiaohong Chen
{"title":"DFEDC: Dual fusion with enhanced deformable convolution for medical image segmentation","authors":"Xian Fang,&nbsp;Yueqian Pan,&nbsp;Qiaohong Chen","doi":"10.1016/j.imavis.2024.105277","DOIUrl":"10.1016/j.imavis.2024.105277","url":null,"abstract":"<div><p>Considering the complexity of lesion regions in medical images, current researches relying on CNNs typically employ large-kernel convolutions to expand the receptive field and enhance segmentation quality. However, these convolution methods are hindered by substantial computational requirements and limited capacity to extract contextual and multi-scale information, making it challenging to efficiently segment complex regions. To address this issue, we propose a dual fusion with enhanced deformable convolution network, namely DFEDC, which dynamically adjusts the receptive field and simultaneously integrates multi-scale feature information to effectively segment complex lesion areas and process boundaries. Firstly, we combine global channel and spatial fusion in a serial way, which integrates and reuses global channel attention and fully connected layers to achieve lightweight extraction of channel and spatial information. Additionally, we design a structured deformable convolution (SDC) that structures deformable convolution with inceptions and large kernel attention, and enhances the learning of offsets through parallel fusion to efficiently extract multi-scale feature information. To compensate for the loss of spatial information of SDC, we introduce a hybrid 2D and 3D feature extraction module to transform feature extraction from a single dimension to a fusion of 2D and 3D. Extensive experimental results on the Synapse, ACDC, and ISIC-2018 datasets demonstrate that our proposed DFEDC achieves superior results.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105277"},"PeriodicalIF":4.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLAI: Exploration and Exploitation based on Visual-Language Aligned Information for Robotic Object Goal Navigation VLAI:基于视觉语言对齐信息的探索和利用,用于机器人目标导航
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-12 DOI: 10.1016/j.imavis.2024.105259
Haonan Luo, Yijie Zeng, Li Yang, Kexun Chen, Zhixuan Shen, Fengmao Lv
{"title":"VLAI: Exploration and Exploitation based on Visual-Language Aligned Information for Robotic Object Goal Navigation","authors":"Haonan Luo,&nbsp;Yijie Zeng,&nbsp;Li Yang,&nbsp;Kexun Chen,&nbsp;Zhixuan Shen,&nbsp;Fengmao Lv","doi":"10.1016/j.imavis.2024.105259","DOIUrl":"10.1016/j.imavis.2024.105259","url":null,"abstract":"<div><p>Object Goal Navigation(ObjectNav) is the task that an agent need navigate to an instance of a specific category in an unseen environment through visual observations within limited time steps. This work plays a significant role in enhancing the efficiency of locating specific items in indoor spaces and assisting individuals in completing various tasks, as well as providing support for people with disabilities. To achieve efficient ObjectNav in unfamiliar environments, global perception capabilities, understanding the regularities of space and semantics in the environment layout are significant. In this work, we propose an explicit-prediction method called VLAI that utilizes visual-language alignment information to guide the agent's exploration, unlike previous navigation methods based on frontier potential prediction or egocentric map completion, which only leverage visual observations to construct semantic maps, thus failing to help the agent develop a better global perception. Specifically, when predicting long-term goals, we retrieve previously saved visual observations to obtain visual information around the frontiers based on their position on the incrementally built incomplete semantic map. Then, we apply our designed Chat Describer to this visual information to obtain detailed frontier object descriptions. The Chat Describer, a novel automatic-questioning approach deployed in Visual-to-Language, is composed of Large Language Model(LLM) and the visual-to-language model(VLM), which has visual question-answering functionality. In addition, we also obtain the semantic similarity of target object and frontier object categories. Ultimately, by combining the semantic similarity and the boundary descriptions, the agent can predict the long-term goals more accurately. Our experiments on the Gibson and HM3D datasets reveal that our VLAI approach yields significantly better results compared to earlier methods. The code is released at</p><p><span><span><span>https://github.com/31539lab/VLAI</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105259"},"PeriodicalIF":4.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGIL-SwinT: Attention-guided inconsistency learning for face forgery detection AGIL-SwinT:用于人脸伪造检测的注意力引导的不一致性学习
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-12 DOI: 10.1016/j.imavis.2024.105274
Wuti Xiong , Haoyu Chen , Guoying Zhao , Xiaobai Li
{"title":"AGIL-SwinT: Attention-guided inconsistency learning for face forgery detection","authors":"Wuti Xiong ,&nbsp;Haoyu Chen ,&nbsp;Guoying Zhao ,&nbsp;Xiaobai Li","doi":"10.1016/j.imavis.2024.105274","DOIUrl":"10.1016/j.imavis.2024.105274","url":null,"abstract":"<div><div>Face forgery detection (FFD) plays a vital role in maintaining the security and integrity of various information and media systems. Forgery inconsistency caused by manipulation techniques has been proven to be effective for generalizing to the unseen data domain. However, most existing works rely on pixel-level forgery annotations to learn forgery inconsistency. To address the problem, we propose a novel Swin Transformer-based method, AGIL-SwinT, that can effectively learn forgery inconsistency using only video-level labels. Specifically, we first leverage the Swin Transformer to generate the initial mask for the forgery regions. Then, we introduce an attention-guided inconsistency learning module that uses unsupervised learning to learn inconsistency from attention. The learned inconsistency is used to revise the initial mask for enhancing forgery detection. In addition, we introduce a forgery mask refinement module to obtain reliable inconsistency labels for supervising inconsistency learning and ensuring the mask is aligned with the forgery boundaries. We conduct extensive experiments on multiple FFD benchmarks, including intra-dataset, cross-dataset and cross-manipulation testing. The experimental results demonstrate that our method significantly outperforms existing methods and generalizes well to unseen datasets and manipulation categories. Our code is available at <span><span><span>https://github.com/woody-xiong/AGIL-SwinT</span></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105274"},"PeriodicalIF":4.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142310475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting certified robustness via an expectation-based similarity regularization 通过基于期望的相似性正则化提高认证稳健性
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-12 DOI: 10.1016/j.imavis.2024.105272
Jiawen Li, Kun Fang, Xiaolin Huang, Jie Yang
{"title":"Boosting certified robustness via an expectation-based similarity regularization","authors":"Jiawen Li,&nbsp;Kun Fang,&nbsp;Xiaolin Huang,&nbsp;Jie Yang","doi":"10.1016/j.imavis.2024.105272","DOIUrl":"10.1016/j.imavis.2024.105272","url":null,"abstract":"<div><p>A certifiably robust classifier implies the one that is theoretically guaranteed to provide robust predictions against <em>any</em> adversarial attacks under certain conditions. Recent defense methods aim to regularize predictions by ensuring consistency across diverse perturbed samplings around the same sample, thus enhancing the certified robustness of the classifier. However, starting from the visualization of latent representations from classifiers trained with existing defense methods, we observe that noisy samplings of other classes are still easily found near a single sample, undermining the confidence in the neighborhood of inputs required by the certified robustness. Motivated by this observation, a novel training method, namely Expectation-based Similarity Regularization for Randomized Smoothing (ESR-RS), is proposed to optimize the distance between samples utilizing metric learning. To meet the requirement of certified robustness, ESR-RS focuses on the average performance of base classifier, and adopts the expected feature approximated by the average value of multiple Gaussian-corrupted samplings around every sample, to compute similarity scores between samples in the latent space. The metric learning loss is then applied to maximize the representation similarity within the same class and minimize it between different classes. Besides, an adaptive weight correlated with the classification performance is used to control the strength of the proposed similarity regularization. Extensive experiments have verified that our method contributes to stronger certified robustness over multiple defense methods without heavy computational costs.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105272"},"PeriodicalIF":4.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DFCNet +: Cross-modal dynamic feature contrast net for continuous sign language recognition DFCNet +:用于连续手语识别的跨模态动态特征对比网
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-11 DOI: 10.1016/j.imavis.2024.105260
Yuan Feng , Nuoyi Chen , Yumeng Wu , Caoyu Jiang , Sheng Liu , Shengyong Chen
{"title":"DFCNet +: Cross-modal dynamic feature contrast net for continuous sign language recognition","authors":"Yuan Feng ,&nbsp;Nuoyi Chen ,&nbsp;Yumeng Wu ,&nbsp;Caoyu Jiang ,&nbsp;Sheng Liu ,&nbsp;Shengyong Chen","doi":"10.1016/j.imavis.2024.105260","DOIUrl":"10.1016/j.imavis.2024.105260","url":null,"abstract":"<div><div>In sign language communication, the combination of hand signs and facial expressions is used to convey messages in a fluid manner. Accurate interpretation relies heavily on understanding the context of these signs. Current methods, however, often focus on static images, missing the continuous flow and the story that unfolds through successive movements in sign language. To address this constraint, our research introduces the Dynamic Feature Contrast Net Plus (DFCNet<!--> <!-->+), a novel model that incorporates both dynamic feature extraction and cross-modal learning. The dynamic feature extraction module of DFCNet<!--> <!-->+ uses dynamic trajectory capture to monitor and record motion across frames and apply key features as an enhancement tool that highlights pixels that are critical for recognizing important sign language movements, allowing the model to follow the temporal variation of the signs. In the cross-modal learning module, we depart from the conventional approach of aligning video frames with textual descriptions. Instead, we adopt a gloss-level alignment, which provides a more detailed match between the visual signals and their corresponding text glosses, capturing the intricate relationship between what is seen and the associated text. The enhanced proficiency of DFCNet<!--> <!-->+ in discerning inter-frame details translates to heightened precision on benchmarks such as PHOENIX14, PHOENIX14-T and CSL-Daily. Such performance underscores its advantage in dynamic feature capture and inter-modal learning compared to conventional approaches to sign language interpretation. Our code is available at <span><span>https://github.com/fyzjut/DFCNet_Plus</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105260"},"PeriodicalIF":4.2,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142310478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UAV image object detection based on self-attention guidance and global feature fusion 基于自我注意引导和全局特征融合的无人机图像目标检测
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-10 DOI: 10.1016/j.imavis.2024.105262
Jing Bai , Haiyang Hu , Xiaojing Liu , Shanna Zhuang , Zhengyou Wang
{"title":"UAV image object detection based on self-attention guidance and global feature fusion","authors":"Jing Bai ,&nbsp;Haiyang Hu ,&nbsp;Xiaojing Liu ,&nbsp;Shanna Zhuang ,&nbsp;Zhengyou Wang","doi":"10.1016/j.imavis.2024.105262","DOIUrl":"10.1016/j.imavis.2024.105262","url":null,"abstract":"<div><p>Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105262"},"PeriodicalIF":4.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic deep spare clustering with a dynamic population-based evolutionary algorithm using reinforcement learning and transfer learning 使用强化学习和迁移学习的基于种群的动态进化算法自动进行深度备用聚类
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-10 DOI: 10.1016/j.imavis.2024.105258
Parham Hadikhani , Daphne Teck Ching Lai , Wee-Hong Ong , Mohammad H. Nadimi-Shahraki
{"title":"Automatic deep spare clustering with a dynamic population-based evolutionary algorithm using reinforcement learning and transfer learning","authors":"Parham Hadikhani ,&nbsp;Daphne Teck Ching Lai ,&nbsp;Wee-Hong Ong ,&nbsp;Mohammad H. Nadimi-Shahraki","doi":"10.1016/j.imavis.2024.105258","DOIUrl":"10.1016/j.imavis.2024.105258","url":null,"abstract":"<div><p>Clustering data effectively remains a significant challenge in machine learning, particularly when the optimal number of clusters is unknown. Traditional deep clustering methods often struggle with balancing local and global search, leading to premature convergence and inefficiency. To address these issues, we introduce ADSC-DPE-RT (Automatic Deep Sparse Clustering with a Dynamic Population-based Evolutionary Algorithm using Reinforcement Learning and Transfer Learning), a novel deep clustering approach. ADSC-DPE-RT builds on Multi-Trial Vector-based Differential Evolution (MTDE), an algorithm that integrates sparse auto-encoding and manifold learning to enable automatic clustering without prior knowledge of cluster count. However, MTDE's fixed population size can lead to either prolonged computation or premature convergence. Our approach introduces a dynamic population generation technique guided by Reinforcement Learning (RL) and Markov Decision Process (MDP) principles. This allows for flexible adjustment of population size, preventing premature convergence and reducing computation time. Additionally, we incorporate Generative Adversarial Networks (GANs) to facilitate dynamic knowledge transfer between MTDE strategies, enhancing diversity and accelerating convergence towards the global optimum. This is the first work to address the dynamic population issue in deep clustering through RL, combined with Transfer Learning to optimize evolutionary algorithms. Our results demonstrate significant improvements in clustering performance, positioning ADSC-DPE-RT as a competitive alternative to state-of-the-art deep clustering methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105258"},"PeriodicalIF":4.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nighttime scene understanding with label transfer scene parser 利用标签转移场景解析器理解夜间场景
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-08 DOI: 10.1016/j.imavis.2024.105257
Thanh-Danh Nguyen , Nguyen Phan , Tam V. Nguyen , Vinh-Tiep Nguyen , Minh-Triet Tran
{"title":"Nighttime scene understanding with label transfer scene parser","authors":"Thanh-Danh Nguyen ,&nbsp;Nguyen Phan ,&nbsp;Tam V. Nguyen ,&nbsp;Vinh-Tiep Nguyen ,&nbsp;Minh-Triet Tran","doi":"10.1016/j.imavis.2024.105257","DOIUrl":"10.1016/j.imavis.2024.105257","url":null,"abstract":"<div><p>Semantic segmentation plays a crucial role in traffic scene understanding, especially in nighttime conditions. This paper tackles the task of semantic segmentation in nighttime scenes. The largest challenge of this task is the lack of annotated nighttime images to train a deep learning-based scene parser. The existing annotated datasets are abundant in daytime conditions but scarce in nighttime due to the high cost. Thus, we propose a novel Label Transfer Scene Parser (LTSP) framework for nighttime scene semantic segmentation by leveraging daytime annotation transfer. Our framework performs segmentation in the dark without training on real nighttime annotated data. In particular, we propose translating daytime images to nighttime conditions to obtain more data with annotation in an efficient way. In addition, we utilize the pseudo-labels inferred from unlabeled nighttime scenes to further train the scene parser. The novelty of our work is the ability to perform nighttime segmentation via daytime annotated labels and nighttime synthetic versions of the same set of images. The extensive experiments demonstrate the improvement and efficiency of our scene parser over the state-of-the-art methods with a similar semi-supervised approach on the benchmark of Nighttime Driving Test dataset. Notably, our proposed method utilizes only one-tenth of the amount of labeled and unlabeled data in comparison with the previous methods. Code is available at <span><span>https://github.com/danhntd/Label_Transfer_Scene_Parser.git</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105257"},"PeriodicalIF":4.2,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RFSC-net: Re-parameterization forward semantic compensation network in low-light environments RFSC-net:低照度环境下重新参数化的前向语义补偿网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-07 DOI: 10.1016/j.imavis.2024.105271
Wenhao Zhang , Huiying Xu , Xinzhong Zhu , Yunzhong Si , Yao Dong , Xiao Huang , Hongbo Li
{"title":"RFSC-net: Re-parameterization forward semantic compensation network in low-light environments","authors":"Wenhao Zhang ,&nbsp;Huiying Xu ,&nbsp;Xinzhong Zhu ,&nbsp;Yunzhong Si ,&nbsp;Yao Dong ,&nbsp;Xiao Huang ,&nbsp;Hongbo Li","doi":"10.1016/j.imavis.2024.105271","DOIUrl":"10.1016/j.imavis.2024.105271","url":null,"abstract":"<div><div>Although detectors currently perform well in well-light conditions, their accuracy decreases due to insufficient object information. In addressing this issue, we propose the Re-parameterization Forward Semantic Compensation Network (RFSC-Net). We propose the Reparameterization Residual Efficient Layer Aggregation Networks (RSELAN) for feature extraction, which integrates the concepts of re-parameterization and the Efficient Layer Aggregation Networks (ELAN). While focusing on the fusion of feature maps of the same dimension, it also incorporates upward fusion of lower-level feature maps, enhancing the detailed texture information in higher-level features. Our proposed Forward Semantic Compensation Feature Fusion (FSCFF) network reduces interference from high-level to low-level semantic information, retaining finer details to improve detection accuracy in low-light conditions. Experiments on the low-light ExDark and DarkFace datasets show that RFSC-Net improves mAP by 2% on ExDark and 0.5% on DarkFace over the YOLOv8n baseline, without an increase in parameter counts. Additionally, AP50 is enhanced by 2.1% on ExDark and 1.1% on DarkFace, with a mere 3.7 ms detection latency on ExDark.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105271"},"PeriodicalIF":4.2,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142314367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic segmentation of deep endometriosis in the rectosigmoid using deep learning 利用深度学习自动分割直肠乙状结肠深部子宫内膜异位症
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2024-09-06 DOI: 10.1016/j.imavis.2024.105261
Weslley Kelson Ribeiro Figueredo , Aristófanes Corrêa Silva , Anselmo Cardoso de Paiva , João Otávio Bandeira Diniz , Alice Brandão , Marco Aurelio Pinho Oliveira
{"title":"Automatic segmentation of deep endometriosis in the rectosigmoid using deep learning","authors":"Weslley Kelson Ribeiro Figueredo ,&nbsp;Aristófanes Corrêa Silva ,&nbsp;Anselmo Cardoso de Paiva ,&nbsp;João Otávio Bandeira Diniz ,&nbsp;Alice Brandão ,&nbsp;Marco Aurelio Pinho Oliveira","doi":"10.1016/j.imavis.2024.105261","DOIUrl":"10.1016/j.imavis.2024.105261","url":null,"abstract":"<div><p>Endometriosis is an inflammatory disease that causes several symptoms, such as infertility and constant pain. While biopsy remains the gold standard for diagnosing endometriosis, imaging tests, particularly magnetic resonance, are becoming increasingly prominent, especially in cases of deep infiltrating disease. However, precise and accurate MRI results require a skilled radiologist. In this study, we employ our built dataset to propose an automated method for classifying patients with endometriosis and segmenting the endometriosis lesion in magnetic resonance images of the rectum and sigmoid colon using image processing and deep learning techniques. Our goals are to assist in the diagnosis, to map the extent of the disease before a surgical procedure, and to help reduce the need for invasive diagnostic methods. This method consists of the following steps: rectosigmoid ROI extraction, image classification, initial lesion segmentation, lesion ROI extraction, and final lesion segmentation. ROI extraction is employed to limit the area while searching for lesions. Using an ensemble of networks, classification of images and patients, with or without endometriosis, achieved accuracies of 87.46% and 96.67%, respectively. One of these networks is a proposed modification of VGG-16. The initial segmentation step produces candidate regions for lesions using TransUnet, achieving a Dice index of 51%. These regions serve as the basis for extracting a new ROI. In the final lesion segmentation, and also using TransUnet, we obtain a Dice index of 65.44%.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105261"},"PeriodicalIF":4.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信