Michal K. Grzeszczyk, Przemysław Korzeniowski, Samer Alabed, Andrew J. Swift, Tomasz Trzciński, Arkadiusz Sitek
{"title":"TabMixer: Noninvasive Estimation of the Mean Pulmonary Artery Pressure via Imaging and Tabular Data Mixing","authors":"Michal K. Grzeszczyk, Przemysław Korzeniowski, Samer Alabed, Andrew J. Swift, Tomasz Trzciński, Arkadiusz Sitek","doi":"arxiv-2409.07564","DOIUrl":"https://doi.org/arxiv-2409.07564","url":null,"abstract":"Right Heart Catheterization is a gold standard procedure for diagnosing\u0000Pulmonary Hypertension by measuring mean Pulmonary Artery Pressure (mPAP). It\u0000is invasive, costly, time-consuming and carries risks. In this paper, for the\u0000first time, we explore the estimation of mPAP from videos of noninvasive\u0000Cardiac Magnetic Resonance Imaging. To enhance the predictive capabilities of\u0000Deep Learning models used for this task, we introduce an additional modality in\u0000the form of demographic features and clinical measurements. Inspired by\u0000all-Multilayer Perceptron architectures, we present TabMixer, a novel module\u0000enabling the integration of imaging and tabular data through spatial, temporal\u0000and channel mixing. Specifically, we present the first approach that utilizes\u0000Multilayer Perceptrons to interchange tabular information with imaging features\u0000in vision models. We test TabMixer for mPAP estimation and show that it\u0000enhances the performance of Convolutional Neural Networks, 3D-MLP and Vision\u0000Transformers while being competitive with previous modules for imaging and\u0000tabular data. Our approach has the potential to improve clinical processes\u0000involving both modalities, particularly in noninvasive mPAP estimation, thus,\u0000significantly enhancing the quality of life for individuals affected by\u0000Pulmonary Hypertension. We provide a source code for using TabMixer at\u0000https://github.com/SanoScience/TabMixer.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Taimur Ahad, Sajib Bin Mamun, Sumaya Mustofa, Bo Song, Yan Li
{"title":"A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network","authors":"Md Taimur Ahad, Sajib Bin Mamun, Sumaya Mustofa, Bo Song, Yan Li","doi":"arxiv-2409.06689","DOIUrl":"https://doi.org/arxiv-2409.06689","url":null,"abstract":"Over the years in object detection several efficient Convolutional Neural\u0000Networks (CNN) networks, such as DenseNet201, InceptionV3, ResNet152v2,\u0000SEresNet152, VGG19, Xception gained significant attention due to their\u0000performance. Moreover, CNN paradigms have expanded to transfer learning and\u0000ensemble models from original CNN architectures. Research studies suggest that\u0000transfer learning and ensemble models are capable of increasing the accuracy of\u0000deep learning (DL) models. However, very few studies have conducted\u0000comprehensive experiments utilizing these techniques in detecting and\u0000localizing blood malignancies. Realizing the gap, this study conducted three\u0000experiments; in the first experiment -- six original CNNs were used, in the\u0000second experiment -- transfer learning and, in the third experiment a novel\u0000ensemble model DIX (DenseNet201, InceptionV3, and Xception) was developed to\u0000detect and classify blood cancer. The statistical result suggests that DIX\u0000outperformed the original and transfer learning performance, providing an\u0000accuracy of 99.12%. However, this study also provides a negative result in the\u0000case of transfer learning, as the transfer learning did not increase the\u0000accuracy of the original CNNs. Like many other cancers, blood cancer diseases\u0000require timely identification for effective treatment plans and increased\u0000survival possibilities. The high accuracy in detecting and categorization blood\u0000cancer detection using CNN suggests that the CNN model is promising in blood\u0000cancer disease detection. This research is significant in the fields of\u0000biomedical engineering, computer-aided disease diagnosis, and ML-based disease\u0000detection.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Universal End-to-End Neural Network for Lossy Image Compression","authors":"Bouzid Arezki, Fangchen Feng, Anissa Mokraoui","doi":"arxiv-2409.06586","DOIUrl":"https://doi.org/arxiv-2409.06586","url":null,"abstract":"This paper presents variable bitrate lossy image compression using a\u0000VAE-based neural network. An adaptable image quality adjustment strategy is\u0000proposed. The key innovation involves adeptly adjusting the input scale\u0000exclusively during the inference process, resulting in an exceptionally\u0000efficient rate-distortion mechanism. Through extensive experimentation, across\u0000diverse VAE-based compression architectures (CNN, ViT) and training\u0000methodologies (MSE, SSIM), our approach exhibits remarkable universality. This\u0000success is attributed to the inherent generalization capacity of neural\u0000networks. Unlike methods that adjust model architecture or loss functions, our\u0000approach emphasizes simplicity, reducing computational complexity and memory\u0000requirements. The experiments not only highlight the effectiveness of our\u0000approach but also indicate its potential to drive advancements in variable-rate\u0000neural network lossy image compression methodologies.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikko Saukkoriipi, Jaakko Sahlsten, Joel Jaskari, Lotta Orasmaa, Jari Kangas, Nastaran Rasouli, Roope Raisamo, Jussi Hirvonen, Helena Mehtonen, Jorma Järnstedt, Antti Mäkitie, Mohamed Naser, Clifton Fuller, Benjamin Kann, Kimmo Kaski
{"title":"Interactive 3D Segmentation for Primary Gross Tumor Volume in Oropharyngeal Cancer","authors":"Mikko Saukkoriipi, Jaakko Sahlsten, Joel Jaskari, Lotta Orasmaa, Jari Kangas, Nastaran Rasouli, Roope Raisamo, Jussi Hirvonen, Helena Mehtonen, Jorma Järnstedt, Antti Mäkitie, Mohamed Naser, Clifton Fuller, Benjamin Kann, Kimmo Kaski","doi":"arxiv-2409.06605","DOIUrl":"https://doi.org/arxiv-2409.06605","url":null,"abstract":"The main treatment modality for oropharyngeal cancer (OPC) is radiotherapy,\u0000where accurate segmentation of the primary gross tumor volume (GTVp) is\u0000essential. However, accurate GTVp segmentation is challenging due to\u0000significant interobserver variability and the time-consuming nature of manual\u0000annotation, while fully automated methods can occasionally fail. An interactive\u0000deep learning (DL) model offers the advantage of automatic high-performance\u0000segmentation with the flexibility for user correction when necessary. In this\u0000study, we examine interactive DL for GTVp segmentation in OPC. We implement\u0000state-of-the-art algorithms and propose a novel two-stage Interactive Click\u0000Refinement (2S-ICR) framework. Using the 2021 HEad and neCK TumOR (HECKTOR)\u0000dataset for development and an external dataset from The University of Texas MD\u0000Anderson Cancer Center for evaluation, the 2S-ICR framework achieves a Dice\u0000similarity coefficient of 0.713 $pm$ 0.152 without user interaction and 0.824\u0000$pm$ 0.099 after five interactions, outperforming existing methods in both\u0000cases.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang
{"title":"Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models","authors":"Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang","doi":"arxiv-2409.06420","DOIUrl":"https://doi.org/arxiv-2409.06420","url":null,"abstract":"Learning-based methods for underwater image enhancement (UWIE) have undergone\u0000extensive exploration. However, learning-based models are usually vulnerable to\u0000adversarial examples so as the UWIE models. To the best of our knowledge, there\u0000is no comprehensive study on the adversarial robustness of UWIE models, which\u0000indicates that UWIE models are potentially under the threat of adversarial\u0000attacks. In this paper, we propose a general adversarial attack protocol. We\u0000make a first attempt to conduct adversarial attacks on five well-designed UWIE\u0000models on three common underwater image benchmark datasets. Considering the\u0000scattering and absorption of light in the underwater environment, there exists\u0000a strong correlation between color correction and underwater image enhancement.\u0000On the basis of that, we also design two effective UWIE-oriented adversarial\u0000attack methods Pixel Attack and Color Shift Attack targeting different color\u0000spaces. The results show that five models exhibit varying degrees of\u0000vulnerability to adversarial attacks and well-designed small perturbations on\u0000degraded images are capable of preventing UWIE models from generating enhanced\u0000results. Further, we conduct adversarial training on these models and\u0000successfully mitigated the effectiveness of adversarial attacks. In summary, we\u0000reveal the adversarial vulnerability of UWIE models and propose a new\u0000evaluation dimension of UWIE models.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Wang, Tao Tan, Yuan Gao, Eric Marcus, Luyi Han, Antonio Portaluri, Tianyu Zhang, Chunyao Lu, Xinglong Liang, Regina Beets-Tan, Jonas Teuwen, Ritse Mann
{"title":"Ordinal Learning: Longitudinal Attention Alignment Model for Predicting Time to Future Breast Cancer Events from Mammograms","authors":"Xin Wang, Tao Tan, Yuan Gao, Eric Marcus, Luyi Han, Antonio Portaluri, Tianyu Zhang, Chunyao Lu, Xinglong Liang, Regina Beets-Tan, Jonas Teuwen, Ritse Mann","doi":"arxiv-2409.06887","DOIUrl":"https://doi.org/arxiv-2409.06887","url":null,"abstract":"Precision breast cancer (BC) risk assessment is crucial for developing\u0000individualized screening and prevention. Despite the promising potential of\u0000recent mammogram (MG) based deep learning models in predicting BC risk, they\u0000mostly overlook the 'time-to-future-event' ordering among patients and exhibit\u0000limited explorations into how they track history changes in breast tissue,\u0000thereby limiting their clinical application. In this work, we propose a novel\u0000method, named OA-BreaCR, to precisely model the ordinal relationship of the\u0000time to and between BC events while incorporating longitudinal breast tissue\u0000changes in a more explainable manner. We validate our method on public EMBED\u0000and inhouse datasets, comparing with existing BC risk prediction and time\u0000prediction methods. Our ordinal learning method OA-BreaCR outperforms existing\u0000methods in both BC risk and time-to-future-event prediction tasks.\u0000Additionally, ordinal heatmap visualizations show the model's attention over\u0000time. Our findings underscore the importance of interpretable and precise risk\u0000assessment for enhancing BC screening and prevention efforts. The code will be\u0000accessible to the public.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation","authors":"Yin Hu, Xianping Ma, Jialu Sui, Man-On Pun","doi":"arxiv-2409.06309","DOIUrl":"https://doi.org/arxiv-2409.06309","url":null,"abstract":"Semantic segmentation is a vital task in the field of remote sensing (RS).\u0000However, conventional convolutional neural network (CNN) and transformer-based\u0000models face limitations in capturing long-range dependencies or are often\u0000computationally intensive. Recently, an advanced state space model (SSM),\u0000namely Mamba, was introduced, offering linear computational complexity while\u0000effectively establishing long-distance dependencies. Despite their advantages,\u0000Mamba-based methods encounter challenges in preserving local semantic\u0000information. To cope with these challenges, this paper proposes a novel network\u0000called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for RS\u0000semantic segmentation tasks. The core structure of PPMamba, the Pyramid\u0000Pooling-State Space Model (PP-SSM) block, combines a local auxiliary mechanism\u0000with an omnidirectional state space model (OSS) that selectively scans feature\u0000maps from eight directions, capturing comprehensive feature information.\u0000Additionally, the auxiliary mechanism includes pyramid-shaped convolutional\u0000branches designed to extract features at multiple scales. Extensive experiments\u0000on two widely-used datasets, ISPRS Vaihingen and LoveDA Urban, demonstrate that\u0000PPMamba achieves competitive performance compared to state-of-the-art models.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection","authors":"Md Taimur Ahad, Sumaya Mustofa, Faruk Ahmed, Yousuf Rayhan Emon, Aunirudra Dey Anu","doi":"arxiv-2409.06699","DOIUrl":"https://doi.org/arxiv-2409.06699","url":null,"abstract":"In deep learning, transfer learning and ensemble models have shown promise in\u0000improving computer-aided disease diagnosis. However, applying the transfer\u0000learning and ensemble model is still relatively limited. Moreover, the ensemble\u0000model's development is ad-hoc, overlooks redundant layers, and suffers from\u0000imbalanced datasets and inadequate augmentation. Lastly, significant Deep\u0000Convolutional Neural Networks (D-CNNs) have been introduced to detect and\u0000classify breast cancer. Still, very few comparative studies were conducted to\u0000investigate the accuracy and efficiency of existing CNN architectures.\u0000Realising the gaps, this study compares the performance of D-CNN, which\u0000includes the original CNN, transfer learning, and an ensemble model, in\u0000detecting breast cancer. The comparison study of this paper consists of\u0000comparison using six CNN-based deep learning architectures (SE-ResNet152,\u0000MobileNetV2, VGG19, ResNet18, InceptionV3, and DenseNet-121), a transfer\u0000learning, and an ensemble model on breast cancer detection. Among the\u0000comparison of these models, the ensemble model provides the highest detection\u0000and classification accuracy of 99.94% for breast cancer detection and\u0000classification. However, this study also provides a negative result in the case\u0000of transfer learning, as the transfer learning did not increase the accuracy of\u0000the original SE-ResNet152, MobileNetV2, VGG19, ResNet18, InceptionV3, and\u0000DenseNet-121 model. The high accuracy in detecting and categorising breast\u0000cancer detection using CNN suggests that the CNN model is promising in breast\u0000cancer disease detection. This research is significant in biomedical\u0000engineering, computer-aided disease diagnosis, and ML-based disease detection.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim
{"title":"A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising","authors":"Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim","doi":"arxiv-2409.06603","DOIUrl":"https://doi.org/arxiv-2409.06603","url":null,"abstract":"State-of-the-art (SOTA) video denoising methods employ multi-frame\u0000simultaneous denoising mechanisms, resulting in significant delays (e.g., 16\u0000frames), making them impractical for real-time cameras. To overcome this\u0000limitation, we propose a multi-fusion gated recurrent Transformer network\u0000(GRTN) that achieves SOTA denoising performance with only a single-frame delay.\u0000Specifically, the spatial denoising module extracts features from the current\u0000frame, while the reset gate selects relevant information from the previous\u0000frame and fuses it with current frame features via the temporal denoising\u0000module. The update gate then further blends this result with the previous frame\u0000features, and the reconstruction module integrates it with the current frame.\u0000To robustly compute attention for noisy features, we propose a residual\u0000simplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and\u0000temporal denoising modules. Comparative objective and subjective results show\u0000that our GRTN achieves denoising performance comparable to SOTA multi-frame\u0000delay networks, with only a single-frame delay.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Denoising: A Powerful Building-Block for Imaging, Inverse Problems, and Machine Learning","authors":"Peyman Milanfar, Mauricio Delbracio","doi":"arxiv-2409.06219","DOIUrl":"https://doi.org/arxiv-2409.06219","url":null,"abstract":"Denoising, the process of reducing random fluctuations in a signal to\u0000emphasize essential patterns, has been a fundamental problem of interest since\u0000the dawn of modern scientific inquiry. Recent denoising techniques,\u0000particularly in imaging, have achieved remarkable success, nearing theoretical\u0000limits by some measures. Yet, despite tens of thousands of research papers, the\u0000wide-ranging applications of denoising beyond noise removal have not been fully\u0000recognized. This is partly due to the vast and diverse literature, making a\u0000clear overview challenging. This paper aims to address this gap. We present a comprehensive perspective\u0000on denoisers, their structure, and desired properties. We emphasize the\u0000increasing importance of denoising and showcase its evolution into an essential\u0000building block for complex tasks in imaging, inverse problems, and machine\u0000learning. Despite its long history, the community continues to uncover\u0000unexpected and groundbreaking uses for denoising, further solidifying its place\u0000as a cornerstone of scientific and engineering practice.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}