arXiv - EE - Image and Video Processing最新文献_第10页

Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement Retinex-RAWMamba：为低照度 RAW 图像增强架起去马赛克和去噪的桥梁

arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI: arxiv-2409.07040

Xianmin Chen, Peiliang Huang, Xiaoxu Feng, Dingwen Zhang, Longfei Han, Junwei Han

{"title":"Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement","authors":"Xianmin Chen, Peiliang Huang, Xiaoxu Feng, Dingwen Zhang, Longfei Han, Junwei Han","doi":"arxiv-2409.07040","DOIUrl":"https://doi.org/arxiv-2409.07040","url":null,"abstract":"Low-light image enhancement, particularly in cross-domain tasks such as\u0000mapping from the raw domain to the sRGB domain, remains a significant\u0000challenge. Many deep learning-based methods have been developed to address this\u0000issue and have shown promising results in recent years. However, single-stage\u0000methods, which attempt to unify the complex mapping across both domains,\u0000leading to limited denoising performance. In contrast, two-stage approaches\u0000typically decompose a raw image with color filter arrays (CFA) into a\u0000four-channel RGGB format before feeding it into a neural network. However, this\u0000strategy overlooks the critical role of demosaicing within the Image Signal\u0000Processing (ISP) pipeline, leading to color distortions under varying lighting\u0000conditions, especially in low-light scenarios. To address these issues, we\u0000design a novel Mamba scanning mechanism, called RAWMamba, to effectively handle\u0000raw images with different CFAs. Furthermore, we present a Retinex Decomposition\u0000Module (RDM) grounded in Retinex prior, which decouples illumination from\u0000reflectance to facilitate more effective denoising and automatic non-linear\u0000exposure correction. By bridging demosaicing and denoising, better raw image\u0000enhancement is achieved. Experimental evaluations conducted on public datasets\u0000SID and MCR demonstrate that our proposed RAWMamba achieves state-of-the-art\u0000performance on cross-domain mapping.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network 利用卷积神经网络进行血癌检测和分类的综合研究

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06689

Md Taimur Ahad, Sajib Bin Mamun, Sumaya Mustofa, Bo Song, Yan Li

{"title":"A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network","authors":"Md Taimur Ahad, Sajib Bin Mamun, Sumaya Mustofa, Bo Song, Yan Li","doi":"arxiv-2409.06689","DOIUrl":"https://doi.org/arxiv-2409.06689","url":null,"abstract":"Over the years in object detection several efficient Convolutional Neural\u0000Networks (CNN) networks, such as DenseNet201, InceptionV3, ResNet152v2,\u0000SEresNet152, VGG19, Xception gained significant attention due to their\u0000performance. Moreover, CNN paradigms have expanded to transfer learning and\u0000ensemble models from original CNN architectures. Research studies suggest that\u0000transfer learning and ensemble models are capable of increasing the accuracy of\u0000deep learning (DL) models. However, very few studies have conducted\u0000comprehensive experiments utilizing these techniques in detecting and\u0000localizing blood malignancies. Realizing the gap, this study conducted three\u0000experiments; in the first experiment -- six original CNNs were used, in the\u0000second experiment -- transfer learning and, in the third experiment a novel\u0000ensemble model DIX (DenseNet201, InceptionV3, and Xception) was developed to\u0000detect and classify blood cancer. The statistical result suggests that DIX\u0000outperformed the original and transfer learning performance, providing an\u0000accuracy of 99.12%. However, this study also provides a negative result in the\u0000case of transfer learning, as the transfer learning did not increase the\u0000accuracy of the original CNNs. Like many other cancers, blood cancer diseases\u0000require timely identification for effective treatment plans and increased\u0000survival possibilities. The high accuracy in detecting and categorization blood\u0000cancer detection using CNN suggests that the CNN model is promising in blood\u0000cancer disease detection. This research is significant in the fields of\u0000biomedical engineering, computer-aided disease diagnosis, and ML-based disease\u0000detection.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Universal End-to-End Neural Network for Lossy Image Compression 用于有损图像压缩的通用端到端神经网络

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06586

Bouzid Arezki, Fangchen Feng, Anissa Mokraoui

引用次数: 0

Interactive 3D Segmentation for Primary Gross Tumor Volume in Oropharyngeal Cancer 口咽癌原发总肿瘤体积的交互式三维分割技术

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06605

Mikko Saukkoriipi, Jaakko Sahlsten, Joel Jaskari, Lotta Orasmaa, Jari Kangas, Nastaran Rasouli, Roope Raisamo, Jussi Hirvonen, Helena Mehtonen, Jorma Järnstedt, Antti Mäkitie, Mohamed Naser, Clifton Fuller, Benjamin Kann, Kimmo Kaski

{"title":"Interactive 3D Segmentation for Primary Gross Tumor Volume in Oropharyngeal Cancer","authors":"Mikko Saukkoriipi, Jaakko Sahlsten, Joel Jaskari, Lotta Orasmaa, Jari Kangas, Nastaran Rasouli, Roope Raisamo, Jussi Hirvonen, Helena Mehtonen, Jorma Järnstedt, Antti Mäkitie, Mohamed Naser, Clifton Fuller, Benjamin Kann, Kimmo Kaski","doi":"arxiv-2409.06605","DOIUrl":"https://doi.org/arxiv-2409.06605","url":null,"abstract":"The main treatment modality for oropharyngeal cancer (OPC) is radiotherapy,\u0000where accurate segmentation of the primary gross tumor volume (GTVp) is\u0000essential. However, accurate GTVp segmentation is challenging due to\u0000significant interobserver variability and the time-consuming nature of manual\u0000annotation, while fully automated methods can occasionally fail. An interactive\u0000deep learning (DL) model offers the advantage of automatic high-performance\u0000segmentation with the flexibility for user correction when necessary. In this\u0000study, we examine interactive DL for GTVp segmentation in OPC. We implement\u0000state-of-the-art algorithms and propose a novel two-stage Interactive Click\u0000Refinement (2S-ICR) framework. Using the 2021 HEad and neCK TumOR (HECKTOR)\u0000dataset for development and an external dataset from The University of Texas MD\u0000Anderson Cancer Center for evaluation, the 2S-ICR framework achieves a Dice\u0000similarity coefficient of 0.713 $pm$ 0.152 without user interaction and 0.824\u0000$pm$ 0.099 after five interactions, outperforming existing methods in both\u0000cases.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models 未揭示的威胁：水下图像增强模型的对抗鲁棒性综合研究

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06420

Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang

{"title":"Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models","authors":"Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang","doi":"arxiv-2409.06420","DOIUrl":"https://doi.org/arxiv-2409.06420","url":null,"abstract":"Learning-based methods for underwater image enhancement (UWIE) have undergone\u0000extensive exploration. However, learning-based models are usually vulnerable to\u0000adversarial examples so as the UWIE models. To the best of our knowledge, there\u0000is no comprehensive study on the adversarial robustness of UWIE models, which\u0000indicates that UWIE models are potentially under the threat of adversarial\u0000attacks. In this paper, we propose a general adversarial attack protocol. We\u0000make a first attempt to conduct adversarial attacks on five well-designed UWIE\u0000models on three common underwater image benchmark datasets. Considering the\u0000scattering and absorption of light in the underwater environment, there exists\u0000a strong correlation between color correction and underwater image enhancement.\u0000On the basis of that, we also design two effective UWIE-oriented adversarial\u0000attack methods Pixel Attack and Color Shift Attack targeting different color\u0000spaces. The results show that five models exhibit varying degrees of\u0000vulnerability to adversarial attacks and well-designed small perturbations on\u0000degraded images are capable of preventing UWIE models from generating enhanced\u0000results. Further, we conduct adversarial training on these models and\u0000successfully mitigated the effectiveness of adversarial attacks. In summary, we\u0000reveal the adversarial vulnerability of UWIE models and propose a new\u0000evaluation dimension of UWIE models.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ordinal Learning: Longitudinal Attention Alignment Model for Predicting Time to Future Breast Cancer Events from Mammograms 顺序学习：从乳房 X 射线照片预测未来乳腺癌事件发生时间的纵向注意力排列模型

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06887

Xin Wang, Tao Tan, Yuan Gao, Eric Marcus, Luyi Han, Antonio Portaluri, Tianyu Zhang, Chunyao Lu, Xinglong Liang, Regina Beets-Tan, Jonas Teuwen, Ritse Mann

{"title":"Ordinal Learning: Longitudinal Attention Alignment Model for Predicting Time to Future Breast Cancer Events from Mammograms","authors":"Xin Wang, Tao Tan, Yuan Gao, Eric Marcus, Luyi Han, Antonio Portaluri, Tianyu Zhang, Chunyao Lu, Xinglong Liang, Regina Beets-Tan, Jonas Teuwen, Ritse Mann","doi":"arxiv-2409.06887","DOIUrl":"https://doi.org/arxiv-2409.06887","url":null,"abstract":"Precision breast cancer (BC) risk assessment is crucial for developing\u0000individualized screening and prevention. Despite the promising potential of\u0000recent mammogram (MG) based deep learning models in predicting BC risk, they\u0000mostly overlook the 'time-to-future-event' ordering among patients and exhibit\u0000limited explorations into how they track history changes in breast tissue,\u0000thereby limiting their clinical application. In this work, we propose a novel\u0000method, named OA-BreaCR, to precisely model the ordinal relationship of the\u0000time to and between BC events while incorporating longitudinal breast tissue\u0000changes in a more explainable manner. We validate our method on public EMBED\u0000and inhouse datasets, comparing with existing BC risk prediction and time\u0000prediction methods. Our ordinal learning method OA-BreaCR outperforms existing\u0000methods in both BC risk and time-to-future-event prediction tasks.\u0000Additionally, ordinal heatmap visualizations show the model's attention over\u0000time. Our findings underscore the importance of interpretable and precise risk\u0000assessment for enhancing BC screening and prevention efforts. The code will be\u0000accessible to the public.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation PPMamba：基于金字塔池化局部辅助 SSM 模型的遥感图像语义分割技术

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06309

Yin Hu, Xianping Ma, Jialu Sui, Man-On Pun

{"title":"PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation","authors":"Yin Hu, Xianping Ma, Jialu Sui, Man-On Pun","doi":"arxiv-2409.06309","DOIUrl":"https://doi.org/arxiv-2409.06309","url":null,"abstract":"Semantic segmentation is a vital task in the field of remote sensing (RS).\u0000However, conventional convolutional neural network (CNN) and transformer-based\u0000models face limitations in capturing long-range dependencies or are often\u0000computationally intensive. Recently, an advanced state space model (SSM),\u0000namely Mamba, was introduced, offering linear computational complexity while\u0000effectively establishing long-distance dependencies. Despite their advantages,\u0000Mamba-based methods encounter challenges in preserving local semantic\u0000information. To cope with these challenges, this paper proposes a novel network\u0000called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for RS\u0000semantic segmentation tasks. The core structure of PPMamba, the Pyramid\u0000Pooling-State Space Model (PP-SSM) block, combines a local auxiliary mechanism\u0000with an omnidirectional state space model (OSS) that selectively scans feature\u0000maps from eight directions, capturing comprehensive feature information.\u0000Additionally, the auxiliary mechanism includes pyramid-shaped convolutional\u0000branches designed to extract features at multiple scales. Extensive experiments\u0000on two widely-used datasets, ISPRS Vaihingen and LoveDA Urban, demonstrate that\u0000PPMamba achieves competitive performance compared to state-of-the-art models.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection 关于用于乳腺癌检测的深度卷积神经网络、迁移学习和集合模型的研究

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06699

Md Taimur Ahad, Sumaya Mustofa, Faruk Ahmed, Yousuf Rayhan Emon, Aunirudra Dey Anu

{"title":"A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection","authors":"Md Taimur Ahad, Sumaya Mustofa, Faruk Ahmed, Yousuf Rayhan Emon, Aunirudra Dey Anu","doi":"arxiv-2409.06699","DOIUrl":"https://doi.org/arxiv-2409.06699","url":null,"abstract":"In deep learning, transfer learning and ensemble models have shown promise in\u0000improving computer-aided disease diagnosis. However, applying the transfer\u0000learning and ensemble model is still relatively limited. Moreover, the ensemble\u0000model's development is ad-hoc, overlooks redundant layers, and suffers from\u0000imbalanced datasets and inadequate augmentation. Lastly, significant Deep\u0000Convolutional Neural Networks (D-CNNs) have been introduced to detect and\u0000classify breast cancer. Still, very few comparative studies were conducted to\u0000investigate the accuracy and efficiency of existing CNN architectures.\u0000Realising the gaps, this study compares the performance of D-CNN, which\u0000includes the original CNN, transfer learning, and an ensemble model, in\u0000detecting breast cancer. The comparison study of this paper consists of\u0000comparison using six CNN-based deep learning architectures (SE-ResNet152,\u0000MobileNetV2, VGG19, ResNet18, InceptionV3, and DenseNet-121), a transfer\u0000learning, and an ensemble model on breast cancer detection. Among the\u0000comparison of these models, the ensemble model provides the highest detection\u0000and classification accuracy of 99.94% for breast cancer detection and\u0000classification. However, this study also provides a negative result in the case\u0000of transfer learning, as the transfer learning did not increase the accuracy of\u0000the original SE-ResNet152, MobileNetV2, VGG19, ResNet18, InceptionV3, and\u0000DenseNet-121 model. The high accuracy in detecting and categorising breast\u0000cancer detection using CNN suggests that the CNN model is promising in breast\u0000cancer disease detection. This research is significant in biomedical\u0000engineering, computer-aided disease diagnosis, and ML-based disease detection.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising 用于视频去噪的包含多重融合的实用门控循环变压器网络

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06603

Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim

{"title":"A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising","authors":"Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim","doi":"arxiv-2409.06603","DOIUrl":"https://doi.org/arxiv-2409.06603","url":null,"abstract":"State-of-the-art (SOTA) video denoising methods employ multi-frame\u0000simultaneous denoising mechanisms, resulting in significant delays (e.g., 16\u0000frames), making them impractical for real-time cameras. To overcome this\u0000limitation, we propose a multi-fusion gated recurrent Transformer network\u0000(GRTN) that achieves SOTA denoising performance with only a single-frame delay.\u0000Specifically, the spatial denoising module extracts features from the current\u0000frame, while the reset gate selects relevant information from the previous\u0000frame and fuses it with current frame features via the temporal denoising\u0000module. The update gate then further blends this result with the previous frame\u0000features, and the reconstruction module integrates it with the current frame.\u0000To robustly compute attention for noisy features, we propose a residual\u0000simplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and\u0000temporal denoising modules. Comparative objective and subjective results show\u0000that our GRTN achieves denoising performance comparable to SOTA multi-frame\u0000delay networks, with only a single-frame delay.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Denoising: A Powerful Building-Block for Imaging, Inverse Problems, and Machine Learning 去噪：成像、逆问题和机器学习的强大构建模块

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI: arxiv-2409.06219

Peyman Milanfar, Mauricio Delbracio

引用次数: 0