Charis Eleftheriadis, Andreas Symeonidis, Panagiotis Katsaros
{"title":"Adversarial robustness improvement for deep neural networks","authors":"Charis Eleftheriadis, Andreas Symeonidis, Panagiotis Katsaros","doi":"10.1007/s00138-024-01519-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01519-1","url":null,"abstract":"<p>Deep neural networks (DNNs) are key components for the implementation of autonomy in systems that operate in highly complex and unpredictable environments (self-driving cars, smart traffic systems, smart manufacturing, etc.). It is well known that DNNs are vulnerable to adversarial examples, i.e. minimal and usually imperceptible perturbations, applied to their inputs, leading to false predictions. This threat poses critical challenges, especially when DNNs are deployed in safety or security-critical systems, and renders as urgent the need for defences that can improve the trustworthiness of DNN functions. Adversarial training has proven effective in improving the robustness of DNNs against a wide range of adversarial perturbations. However, a general framework for adversarial defences is needed that will extend beyond a single-dimensional assessment of robustness improvement; it is essential to consider simultaneously several distance metrics and adversarial attack strategies. Using such an approach we report the results from extensive experimentation on adversarial defence methods that could improve DNNs resilience to adversarial threats. We wrap up by introducing a general adversarial training methodology, which, according to our experimental results, opens prospects for an holistic defence against a range of diverse types of adversarial perturbations.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"76 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140156501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FESAR: SAR ship detection model based on local spatial relationship capture and fused convolutional enhancement","authors":"Chongchong Liu, Chunman Yan","doi":"10.1007/s00138-024-01516-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01516-4","url":null,"abstract":"<p>Synthetic aperture radar (SAR) is instrumental in ship monitoring owing to its all-weather capabilities and high resolution. In SAR images, ship targets frequently display blurred or mixed boundaries with the background, and instances of occlusion or partial occlusion may occur. Additionally, multi-scale transformations and small-target ships pose challenges for ship detection. To tackle these challenges, we propose a novel SAR ship detection model, FESAR. Firstly, in addressing multi-scale transformations in ship detection, we propose the Fused Convolution Enhancement Module (FCEM). This network incorporates distinct convolutional branches designed to capture local and global features, which are subsequently fused and enhanced. Secondly, a Spatial Relationship Analysis Module (SRAM) with a spatial-mixing layer is designed to analyze the local spatial relationship between the ship target and the background, effectively combining local information to discern feature distinctions between the ship target and the background. Finally, a new backbone network, SPD-YOLO, is designed to perform deep downsampling for the comprehensive extraction of semantic information related to ships. To validate the model’s performance, an extensive series of experiments was conducted on the public datasets HRSID, LS-SSDD-v1.0, and SSDD. The results demonstrate the outstanding performance of the proposed FESAR model compared to numerous state-of-the-art (SOTA) models. Relative to the baseline model, FESAR exhibits an improvement in mAP by 2.6% on the HRSID dataset, 5.5% on LS-SSDD-v1.0, and 0.2% on the SSDD dataset. In comparison with numerous SAR ship detection models, FESAR demonstrates superior comprehensive performance.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"52 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140071207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adaptive interpolation and 3D reconstruction algorithm for underwater images","authors":"Zhijie Tang, Congqi Xu, Siyu Yan","doi":"10.1007/s00138-024-01518-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01518-2","url":null,"abstract":"<p>3D reconstruction technology is gradually applied to underwater scenes, which has become a crucial research direction for human ocean exploration and exploitation. However, due to the complexity of the underwater environment, the number of high-quality underwater images acquired by underwater robots is limited and cannot meet the requirements of 3D reconstruction. Therefore, this paper proposes an adaptive 3D reconstruction algorithm for underwater targets. We apply the frame interpolation technique to underwater 3D reconstruction, an unprecedented technical attempt. In this paper, we design a single-stage large-angle span underwater image interpolation model, which has an excellent enhancement effect on degraded underwater 2D images compared with other methods. Current methods make it challenging to balance the relationship between feature information acquisition and underwater image quality improvement. In this paper, an optimized cascaded feature pyramid scheme and an adaptive bidirectional optical flow estimation algorithm based on underwater NRIQA metrics are proposed and applied to the proposed model to solve the above problems. The intermediate image output from the model improves the image quality and retains the detailed information. Experiments show that the method proposed in this paper outperforms other methods when dealing with several typical degradation types of underwater images. In underwater 3D reconstruction, the intermediate image generated by the model is used as input instead of the degraded image to obtain a denser 3D point cloud and better visualization. Our method is instructive to the problem of acquiring underwater high-quality target images and underwater 3D reconstruction.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"66 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-supervised Siamese keypoint inference network for human pose estimation and tracking","authors":"","doi":"10.1007/s00138-024-01515-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01515-5","url":null,"abstract":"<h3>Abstract</h3> <p>Human pose estimation and tracking are important tasks to help understand human behavior. Currently, human pose estimation and tracking face the challenges of missed detection due to sparse annotation of video datasets and difficulty in associating partially occluded and unoccluded cases of the same person. To address these challenges, we propose a self-supervised learning-based method, which infers the correspondence between keypoints to associate persons in the videos. Specifically, we propose a bounding box recovery module to recover missed detections and a Siamese keypoint inference network to solve the issue of error matching caused by occlusions. The local–global attention module, which is designed in the Siamese keypoint inference network, learns the varying dependence information of human keypoints between frames. To simulate the occlusions, we mask random pixels in the image before pre-training using knowledge distillation to associate the differing occlusions of the same person. Our method achieves better results than state-of-the-art methods for human pose estimation and tracking on the PoseTrack 2018 and PoseTrack 2021 datasets. Code is available at: https://github.com/yhtian2023/SKITrack.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"54 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140036481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"That’s BAD: blind anomaly detection by implicit local feature clustering","authors":"Jie Zhang, Masanori Suganuma, Takayuki Okatani","doi":"10.1007/s00138-024-01511-9","DOIUrl":"https://doi.org/10.1007/s00138-024-01511-9","url":null,"abstract":"<p>Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class setting, in which we assume the availability of a set of normal (i.e., anomaly-free) images for training. In this paper, we consider a more challenging scenario of unsupervised AD, in which we detect anomalies in a given set of images that might contain both normal and anomalous samples. The setting does not assume the availability of known normal data and thus is completely free from human annotation, which differs from the standard AD considered in recent studies. For clarity, we call the setting blind anomaly detection (BAD). We show that BAD can be converted into a local outlier detection problem and propose a novel method named PatchCluster that can accurately detect image- and pixel-level anomalies. Experimental results show that PatchCluster shows a promising performance without the knowledge of normal data, even comparable to the SOTA methods applied in the one-class setting needing it.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"16 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuan Zhang, Guoping Xu, Xinglong Wu, Wentao Liao, Xuesong Leng, Xiaxia Wang, Xinwei He, Chang Li
{"title":"A pixel and channel enhanced up-sampling module for biomedical image segmentation","authors":"Xuan Zhang, Guoping Xu, Xinglong Wu, Wentao Liao, Xuesong Leng, Xiaxia Wang, Xinwei He, Chang Li","doi":"10.1007/s00138-024-01513-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01513-7","url":null,"abstract":"<p>Up-sampling operations are frequently utilized to recover the spatial resolution of feature maps in neural networks for segmentation task. However, current up-sampling methods, such as bilinear interpolation or deconvolution, do not fully consider the relationship of feature maps, which have negative impact on learning discriminative features for semantic segmentation. In this paper, we propose a pixel and channel enhanced up-sampling (PCE) module for low-resolution feature maps, aiming to use the relationship of adjacent pixels and channels for learning discriminative high-resolution feature maps. Specifically, the proposed up-sampling module includes two main operations: (1) increasing spatial resolution of feature maps with pixel shuffle and (2) recalibrating channel-wise high-resolution feature response. Our proposed up-sampling module could be integrated into CNN and Transformer segmentation architectures. Extensive experiments on three different modality datasets of biomedical images, including computed tomography (CT), magnetic resonance imaging (MRI) and micro-optical sectioning tomography images (MOST) demonstrate the proposed method could effectively improve the performance of representative segmentation models.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"9 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139952540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A gradient fusion-based image data augmentation method for reflective workpieces detection under small size datasets","authors":"Baori Zhang, Haolang Cai, Lingxiang Wen","doi":"10.1007/s00138-024-01512-8","DOIUrl":"https://doi.org/10.1007/s00138-024-01512-8","url":null,"abstract":"<p>Various of Convolutional Neural Network-based object detection models have been widely used in the industrial field. However, the high accuracy of the object detection of these models is difficult to obtain in the industrial sorting line. This is due to the use of small dataset considering of production cost and the changing features of the reflective workpiece. In order to increase the detecting accuracy, a gradient fusion-based image data augmentation method was presented in this paper. It consisted of a high-dynamic range (HDR) exposing algorithm and an image reconstructing algorithm. It augmented the image data for the training and predicting by increasing the feature richness within the regions of reflection and shadow of the image. Tests were conducted on the comparison with other exposing and image fusion methods. The universality of the proposed method was analyzed by testing on various kinds of workpieces and different models including YOLOv8 and SSD. Finally, the Gradient-weighted Class Activation Mapping (Grad-CAM) method and Mean Average Precision (mAP) were used to analyze the model performance improvement. The results showed that the proposed data augmentation method improved the feature richness of the image and the accuracy of the object detection for the reflective workpieces under small size datasets.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139926925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huanlong Zhang, Panyun Wang, Zhiwu Chen, Jie Zhang, Linwei Li
{"title":"Target–distractor memory joint tracking algorithm via Credit Allocation Network","authors":"Huanlong Zhang, Panyun Wang, Zhiwu Chen, Jie Zhang, Linwei Li","doi":"10.1007/s00138-024-01508-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01508-4","url":null,"abstract":"<p>The tracking framework based on the memory network has gained significant attention due to its enhanced adaptability to variations in target appearance. However, the performance of the framework is limited by the negative effects of distractors in the background. Hence, this paper proposes a method for tracking using Credit Allocation Network to join target and distractor memory. Specifically, we design a Credit Allocation Network (CAN) that is updated online via Guided Focus Loss. The CAN produces credit scores for tracking results by learning features of the target object, ensuring the update of reliable samples for storage in the memory pool. Furthermore, we construct a multi-domain memory model that simultaneously captures target and background information from multiple historical intervals, which can build a more compatible object appearance model while increasing the diversity of the memory sample. Moreover, a novel target–distractor joint localization strategy is presented, which read target and distractor information from memory frames based on cross-attention, so as to cancel out wrong responses in the target response map by using the distractor response map. The experimental results on OTB-2015, GOT-10k, UAV123, LaSOT, and VOT-2018 datasets show the competitiveness and effectiveness of the proposed method compared to other trackers.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"133 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"End-to-end optimized image compression with the frequency-oriented transform","authors":"Yuefeng Zhang, Kai Lin","doi":"10.1007/s00138-023-01507-x","DOIUrl":"https://doi.org/10.1007/s00138-023-01507-x","url":null,"abstract":"<p>Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"33 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interaction semantic segmentation network via progressive supervised learning","authors":"","doi":"10.1007/s00138-023-01500-4","DOIUrl":"https://doi.org/10.1007/s00138-023-01500-4","url":null,"abstract":"<h3>Abstract</h3> <p>Semantic segmentation requires both low-level details and high-level semantics, without losing too much detail and ensuring the speed of inference. Most existing segmentation approaches leverage low- and high-level features from pre-trained models. We propose an interaction semantic segmentation network via Progressive Supervised Learning (ISSNet). Unlike a simple fusion of two sets of features, we introduce an information interaction module to embed semantics into image details, they jointly guide the response of features in an interactive way. We develop a simple yet effective boundary refinement module to provide refined boundary features for matching corresponding semantic. We introduce a progressive supervised learning strategy throughout the training level to significantly promote network performance, not architecture level. Our proposed ISSNet shows optimal inference time. We perform extensive experiments on four datasets, including Cityscapes, HazeCityscapes, RainCityscapes and CamVid. In addition to performing better in fine weather, proposed ISSNet also performs well on rainy and foggy days. We also conduct ablation study to demonstrate the role of our proposed component. Code is available at: https://github.com/Ruini94/ISSNet</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"10 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}