arXiv - EE - Image and Video Processing最新文献_第3页

Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers 时间插件：使用预训练图像去噪器进行无监督视频去噪

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.11256

Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhihao Li, Bihan Wen

{"title":"Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers","authors":"Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhihao Li, Bihan Wen","doi":"arxiv-2409.11256","DOIUrl":"https://doi.org/arxiv-2409.11256","url":null,"abstract":"Recent advancements in deep learning have shown impressive results in image\u0000and video denoising, leveraging extensive pairs of noisy and noise-free data\u0000for supervision. However, the challenge of acquiring paired videos for dynamic\u0000scenes hampers the practical deployment of deep video denoising techniques. In\u0000contrast, this obstacle is less pronounced in image denoising, where paired\u0000data is more readily available. Thus, a well-trained image denoiser could serve\u0000as a reliable spatial prior for video denoising. In this paper, we propose a\u0000novel unsupervised video denoising framework, named ``Temporal As a Plugin''\u0000(TAP), which integrates tunable temporal modules into a pre-trained image\u0000denoiser. By incorporating temporal modules, our method can harness temporal\u0000information across noisy frames, complementing its power of spatial denoising.\u0000Furthermore, we introduce a progressive fine-tuning strategy that refines each\u0000temporal module using the generated pseudo clean video frames, progressively\u0000enhancing the network's denoising performance. Compared to other unsupervised\u0000video denoising methods, our framework demonstrates superior performance on\u0000both sRGB and raw video denoising datasets.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lite-FBCN: Lightweight Fast Bilinear Convolutional Network for Brain Disease Classification from MRI Image Lite-FBCN：从核磁共振成像图像进行脑疾病分类的轻量级快速双线性卷积网络

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.10952

Dewinda Julianensi Rumala, Reza Fuad Rachmadi, Anggraini Dwi Sensusiati, I Ketut Eddy Purnama

{"title":"Lite-FBCN: Lightweight Fast Bilinear Convolutional Network for Brain Disease Classification from MRI Image","authors":"Dewinda Julianensi Rumala, Reza Fuad Rachmadi, Anggraini Dwi Sensusiati, I Ketut Eddy Purnama","doi":"arxiv-2409.10952","DOIUrl":"https://doi.org/arxiv-2409.10952","url":null,"abstract":"Achieving high accuracy with computational efficiency in brain disease\u0000classification from Magnetic Resonance Imaging (MRI) scans is challenging,\u0000particularly when both coarse and fine-grained distinctions are crucial.\u0000Current deep learning methods often struggle to balance accuracy with\u0000computational demands. We propose Lite-FBCN, a novel Lightweight Fast Bilinear\u0000Convolutional Network designed to address this issue. Unlike traditional\u0000dual-network bilinear models, Lite-FBCN utilizes a single-network architecture,\u0000significantly reducing computational load. Lite-FBCN leverages lightweight,\u0000pre-trained CNNs fine-tuned to extract relevant features and incorporates a\u0000channel reducer layer before bilinear pooling, minimizing feature map\u0000dimensionality and resulting in a compact bilinear vector. Extensive\u0000evaluations on cross-validation and hold-out data demonstrate that Lite-FBCN\u0000not only surpasses baseline CNNs but also outperforms existing bilinear models.\u0000Lite-FBCN with MobileNetV1 attains 98.10% accuracy in cross-validation and\u000069.37% on hold-out data (a 3% improvement over the baseline). UMAP\u0000visualizations further confirm its effectiveness in distinguishing closely\u0000related brain disease classes. Moreover, its optimal trade-off between\u0000performance and computational efficiency positions Lite-FBCN as a promising\u0000solution for enhancing diagnostic capabilities in resource-constrained and or\u0000real-time clinical environments.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CUNSB-RFIE: Context-aware Unpaired Neural Schr"{o}dinger Bridge in Retinal Fundus Image Enhancement CUNSB-RFIE：视网膜眼底图像增强中的情境感知非配对神经 Schr"{o}dinger 桥接器

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.10966

Xuanzhao Dong, Vamsi Krishna Vasa, Wenhui Zhu, Peijie Qiu, Xiwen Chen, Yi Su, Yujian Xiong, Zhangsihao Yang, Yanxi Chen, Yalin Wang

{"title":"CUNSB-RFIE: Context-aware Unpaired Neural Schr\"{o}dinger Bridge in Retinal Fundus Image Enhancement","authors":"Xuanzhao Dong, Vamsi Krishna Vasa, Wenhui Zhu, Peijie Qiu, Xiwen Chen, Yi Su, Yujian Xiong, Zhangsihao Yang, Yanxi Chen, Yalin Wang","doi":"arxiv-2409.10966","DOIUrl":"https://doi.org/arxiv-2409.10966","url":null,"abstract":"Retinal fundus photography is significant in diagnosing and monitoring\u0000retinal diseases. However, systemic imperfections and operator/patient-related\u0000factors can hinder the acquisition of high-quality retinal images. Previous\u0000efforts in retinal image enhancement primarily relied on GANs, which are\u0000limited by the trade-off between training stability and output diversity. In\u0000contrast, the Schr\"{o}dinger Bridge (SB), offers a more stable solution by\u0000utilizing Optimal Transport (OT) theory to model a stochastic differential\u0000equation (SDE) between two arbitrary distributions. This allows SB to\u0000effectively transform low-quality retinal images into their high-quality\u0000counterparts. In this work, we leverage the SB framework to propose an\u0000image-to-image translation pipeline for retinal image enhancement.\u0000Additionally, previous methods often fail to capture fine structural details,\u0000such as blood vessels. To address this, we enhance our pipeline by introducing\u0000Dynamic Snake Convolution, whose tortuous receptive field can better preserve\u0000tubular structures. We name the resulting retinal fundus image enhancement\u0000framework the Context-aware Unpaired Neural Schr\"{o}dinger Bridge\u0000(CUNSB-RFIE). To the best of our knowledge, this is the first endeavor to use\u0000the SB approach for retinal image enhancement. Experimental results on a\u0000large-scale dataset demonstrate the advantage of the proposed method compared\u0000to several state-of-the-art supervised and unsupervised methods in terms of\u0000image quality and performance on downstream tasks.The code is available at\u0000url{https://github.com/Retinal-Research/CUNSB-RFIE}.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PSFHS Challenge Report: Pubic Symphysis and Fetal Head Segmentation from Intrapartum Ultrasound Images PSFHS 挑战报告：从产内超声图像中分割耻骨联合和胎儿头部

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.10980

Jieyun Bai, Zihao Zhou, Zhanhong Ou, Gregor Koehler, Raphael Stock, Klaus Maier-Hein, Marawan Elbatel, Robert Martí, Xiaomeng Li, Yaoyang Qiu, Panjie Gou, Gongping Chen, Lei Zhao, Jianxun Zhang, Yu Dai, Fangyijie Wang, Guénolé Silvestre, Kathleen Curran, Hongkun Sun, Jing Xu, Pengzhou Cai, Lu Jiang, Libin Lan, Dong Ni, Mei Zhong, Gaowen Chen, Víctor M. Campello, Yaosheng Lu, Karim Lekadir

{"title":"PSFHS Challenge Report: Pubic Symphysis and Fetal Head Segmentation from Intrapartum Ultrasound Images","authors":"Jieyun Bai, Zihao Zhou, Zhanhong Ou, Gregor Koehler, Raphael Stock, Klaus Maier-Hein, Marawan Elbatel, Robert Martí, Xiaomeng Li, Yaoyang Qiu, Panjie Gou, Gongping Chen, Lei Zhao, Jianxun Zhang, Yu Dai, Fangyijie Wang, Guénolé Silvestre, Kathleen Curran, Hongkun Sun, Jing Xu, Pengzhou Cai, Lu Jiang, Libin Lan, Dong Ni, Mei Zhong, Gaowen Chen, Víctor M. Campello, Yaosheng Lu, Karim Lekadir","doi":"arxiv-2409.10980","DOIUrl":"https://doi.org/arxiv-2409.10980","url":null,"abstract":"Segmentation of the fetal and maternal structures, particularly intrapartum\u0000ultrasound imaging as advocated by the International Society of Ultrasound in\u0000Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a\u0000crucial first step for quantitative diagnosis and clinical decision-making.\u0000This requires specialized analysis by obstetrics professionals, in a task that\u0000i) is highly time- and cost-consuming and ii) often yields inconsistent\u0000results. The utility of automatic segmentation algorithms for biometry has been\u0000proven, though existing results remain suboptimal. To push forward advancements\u0000in this area, the Grand Challenge on Pubic Symphysis-Fetal Head Segmentation\u0000(PSFHS) was held alongside the 26th International Conference on Medical Image\u0000Computing and Computer Assisted Intervention (MICCAI 2023). This challenge\u0000aimed to enhance the development of automatic segmentation algorithms at an\u0000international scale, providing the largest dataset to date with 5,101\u0000intrapartum ultrasound images collected from two ultrasound machines across\u0000three hospitals from two institutions. The scientific community's enthusiastic\u0000participation led to the selection of the top 8 out of 179 entries from 193\u0000registrants in the initial phase to proceed to the competition's second stage.\u0000These algorithms have elevated the state-of-the-art in automatic PSFHS from\u0000intrapartum ultrasound images. A thorough analysis of the results pinpointed\u0000ongoing challenges in the field and outlined recommendations for future work.\u0000The top solutions and the complete dataset remain publicly available, fostering\u0000further advancements in automatic segmentation and biometry for intrapartum\u0000ultrasound imaging.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TTT-Unet: Enhancing U-Net with Test-Time Training Layers for biomedical image segmentation TTT-Unet：利用测试时间训练层增强 U-Net 以进行生物医学图像分割

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.11299

Rong Zhou, Zhengqing Yuan, Zhiling Yan, Weixiang Sun, Kai Zhang, Yiwei Li, Yanfang Ye, Xiang Li, Lifang He, Lichao Sun

{"title":"TTT-Unet: Enhancing U-Net with Test-Time Training Layers for biomedical image segmentation","authors":"Rong Zhou, Zhengqing Yuan, Zhiling Yan, Weixiang Sun, Kai Zhang, Yiwei Li, Yanfang Ye, Xiang Li, Lifang He, Lichao Sun","doi":"arxiv-2409.11299","DOIUrl":"https://doi.org/arxiv-2409.11299","url":null,"abstract":"Biomedical image segmentation is crucial for accurately diagnosing and\u0000analyzing various diseases. However, Convolutional Neural Networks (CNNs) and\u0000Transformers, the most commonly used architectures for this task, struggle to\u0000effectively capture long-range dependencies due to the inherent locality of\u0000CNNs and the computational complexity of Transformers. To address this\u0000limitation, we introduce TTT-Unet, a novel framework that integrates Test-Time\u0000Training (TTT) layers into the traditional U-Net architecture for biomedical\u0000image segmentation. TTT-Unet dynamically adjusts model parameters during the\u0000testing time, enhancing the model's ability to capture both local and\u0000long-range features. We evaluate TTT-Unet on multiple medical imaging datasets,\u0000including 3D abdominal organ segmentation in CT and MR images, instrument\u0000segmentation in endoscopy images, and cell segmentation in microscopy images.\u0000The results demonstrate that TTT-Unet consistently outperforms state-of-the-art\u0000CNN-based and Transformer-based segmentation models across all tasks. The code\u0000is available at https://github.com/rongzhou7/TTT-Unet.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Effective User Attribution for Latent Diffusion Models via Watermark-Informed Blending 通过水印信息混合实现潜在扩散模型的有效用户归属

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.10958

Yongyang Pan, Xiaohong Liu, Siqi Luo, Yi Xin, Xiao Guo, Xiaoming Liu, Xiongkuo Min, Guangtao Zhai

{"title":"Towards Effective User Attribution for Latent Diffusion Models via Watermark-Informed Blending","authors":"Yongyang Pan, Xiaohong Liu, Siqi Luo, Yi Xin, Xiao Guo, Xiaoming Liu, Xiongkuo Min, Guangtao Zhai","doi":"arxiv-2409.10958","DOIUrl":"https://doi.org/arxiv-2409.10958","url":null,"abstract":"Rapid advancements in multimodal large language models have enabled the\u0000creation of hyper-realistic images from textual descriptions. However, these\u0000advancements also raise significant concerns about unauthorized use, which\u0000hinders their broader distribution. Traditional watermarking methods often\u0000require complex integration or degrade image quality. To address these\u0000challenges, we introduce a novel framework Towards Effective user Attribution\u0000for latent diffusion models via Watermark-Informed Blending (TEAWIB). TEAWIB\u0000incorporates a unique ready-to-use configuration approach that allows seamless\u0000integration of user-specific watermarks into generative models. This approach\u0000ensures that each user can directly apply a pre-configured set of parameters to\u0000the model without altering the original model parameters or compromising image\u0000quality. Additionally, noise and augmentation operations are embedded at the\u0000pixel level to further secure and stabilize watermarked images. Extensive\u0000experiments validate the effectiveness of TEAWIB, showcasing the\u0000state-of-the-art performance in perceptual quality and attribution accuracy.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"92 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SkinMamba: A Precision Skin Lesion Segmentation Architecture with Cross-Scale Global State Modeling and Frequency Boundary Guidance SkinMamba：具有跨尺度全球状态建模和频率边界指导功能的精确皮肤病变分割架构

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.10890

Shun Zou, Mingya Zhang, Bingjian Fan, Zhengyi Zhou, Xiuguo Zou

{"title":"SkinMamba: A Precision Skin Lesion Segmentation Architecture with Cross-Scale Global State Modeling and Frequency Boundary Guidance","authors":"Shun Zou, Mingya Zhang, Bingjian Fan, Zhengyi Zhou, Xiuguo Zou","doi":"arxiv-2409.10890","DOIUrl":"https://doi.org/arxiv-2409.10890","url":null,"abstract":"Skin lesion segmentation is a crucial method for identifying early skin\u0000cancer. In recent years, both convolutional neural network (CNN) and\u0000Transformer-based methods have been widely applied. Moreover, combining CNN and\u0000Transformer effectively integrates global and local relationships, but remains\u0000limited by the quadratic complexity of Transformer. To address this, we propose\u0000a hybrid architecture based on Mamba and CNN, called SkinMamba. It maintains\u0000linear complexity while offering powerful long-range dependency modeling and\u0000local feature extraction capabilities. Specifically, we introduce the Scale\u0000Residual State Space Block (SRSSB), which captures global contextual\u0000relationships and cross-scale information exchange at a macro level, enabling\u0000expert communication in a global state. This effectively addresses challenges\u0000in skin lesion segmentation related to varying lesion sizes and inconspicuous\u0000target areas. Additionally, to mitigate boundary blurring and information loss\u0000during model downsampling, we introduce the Frequency Boundary Guided Module\u0000(FBGM), providing sufficient boundary priors to guide precise boundary\u0000segmentation, while also using the retained information to assist the decoder\u0000in the decoding process. Finally, we conducted comparative and ablation\u0000experiments on two public lesion segmentation datasets (ISIC2017 and ISIC2018),\u0000and the results demonstrate the strong competitiveness of SkinMamba in skin\u0000lesion segmentation tasks. The code is available at\u0000https://github.com/zs1314/SkinMamba.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Retinal Vessel Segmentation with Deep Graph and Capsule Reasoning 利用深度图和胶囊推理进行视网膜血管分段

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.11508

Xinxu Wei, Xi Lin, Haiyun Liu, Shixuan Zhao, Yongjie Li

{"title":"Retinal Vessel Segmentation with Deep Graph and Capsule Reasoning","authors":"Xinxu Wei, Xi Lin, Haiyun Liu, Shixuan Zhao, Yongjie Li","doi":"arxiv-2409.11508","DOIUrl":"https://doi.org/arxiv-2409.11508","url":null,"abstract":"Effective retinal vessel segmentation requires a sophisticated integration of\u0000global contextual awareness and local vessel continuity. To address this\u0000challenge, we propose the Graph Capsule Convolution Network (GCC-UNet), which\u0000merges capsule convolutions with CNNs to capture both local and global\u0000features. The Graph Capsule Convolution operator is specifically designed to\u0000enhance the representation of global context, while the Selective Graph\u0000Attention Fusion module ensures seamless integration of local and global\u0000information. To further improve vessel continuity, we introduce the Bottleneck\u0000Graph Attention module, which incorporates Channel-wise and Spatial Graph\u0000Attention mechanisms. The Multi-Scale Graph Fusion module adeptly combines\u0000features from various scales. Our approach has been rigorously validated\u0000through experiments on widely used public datasets, with ablation studies\u0000confirming the efficacy of each component. Comparative results highlight\u0000GCC-UNet's superior performance over existing methods, setting a new benchmark\u0000in retinal vessel segmentation. Notably, this work represents the first\u0000integration of vanilla, graph, and capsule convolutional techniques in the\u0000domain of medical image segmentation.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"92 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach 利用蒸馏辅助可学习方法实现无梯度的事后可解释性

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.11123

Debarpan Bhattacharya, Amir H. Poorjam, Deepak Mittal, Sriram Ganapathy

{"title":"Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach","authors":"Debarpan Bhattacharya, Amir H. Poorjam, Deepak Mittal, Sriram Ganapathy","doi":"arxiv-2409.11123","DOIUrl":"https://doi.org/arxiv-2409.11123","url":null,"abstract":"The recent advancements in artificial intelligence (AI), with the release of\u0000several large models having only query access, make a strong case for\u0000explainability of deep models in a post-hoc gradient free manner. In this\u0000paper, we propose a framework, named distillation aided explainability (DAX),\u0000that attempts to generate a saliency-based explanation in a model agnostic\u0000gradient free application. The DAX approach poses the problem of explanation in\u0000a learnable setting with a mask generation network and a distillation network.\u0000The mask generation network learns to generate the multiplier mask that finds\u0000the salient regions of the input, while the student distillation network aims\u0000to approximate the local behavior of the black-box model. We propose a joint\u0000optimization of the two networks in the DAX framework using the locally\u0000perturbed input samples, with the targets derived from input-output access to\u0000the black-box model. We extensively evaluate DAX across different modalities\u0000(image and audio), in a classification setting, using a diverse set of\u0000evaluations (intersection over union with ground truth, deletion based and\u0000subjective human evaluation based measures) and benchmark it with respect to\u0000$9$ different methods. In these evaluations, the DAX significantly outperforms\u0000the existing approaches on all modalities and evaluation metrics.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HoloTile RGB: Ultra-fast, Speckle-Free RGB Computer Generated Holography HoloTile RGB：超快、无斑点的 RGB 计算机生成全息技术

arXiv - EE - Image and Video Processing Pub Date : 2024-09-17 DOI: arxiv-2409.11049

Andreas Erik Gejl Madsen, Jesper Glückstad

引用次数: 0