Yunliang Jiang, Chenyang Gu, Zhenfeng Xue, Xiongtao Zhang, Yong Liu
{"title":"Mask-Guided Image Person Removal with Data Synthesis","authors":"Yunliang Jiang, Chenyang Gu, Zhenfeng Xue, Xiongtao Zhang, Yong Liu","doi":"10.2139/ssrn.4234905","DOIUrl":"https://doi.org/10.2139/ssrn.4234905","url":null,"abstract":"As a special case of common object removal, image person removal is playing an increasingly important role in social media and criminal investigation domains. Due to the integrity of person area and the complexity of human posture, person removal has its own dilemmas. In this paper, we propose a novel idea to tackle these problems from the perspective of data synthesis. Concerning the lack of dedicated dataset for image person removal, two dataset production methods are proposed to automatically generate images, masks and ground truths respectively. Then, a learning framework similar to local image degradation is proposed so that the masks can be used to guide the feature extraction process and more texture information can be gathered for final prediction. A coarse-to-fine training strategy is further applied to refine the details. The data synthesis and learning framework combine well with each other. Experimental results verify the effectiveness of our method quantitatively and qualitatively, and the trained network proves to have good generalization ability either on real or synthetic images.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"16 1","pages":"2214-2224"},"PeriodicalIF":0.0,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84157705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EDAfuse: A encoder-decoder with atrous spatial pyramid network for infrared and visible image fusion","authors":"Cairen Nie, Dongming Zhou, Rencan Nie","doi":"10.2139/ssrn.3982278","DOIUrl":"https://doi.org/10.2139/ssrn.3982278","url":null,"abstract":"","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"1 1","pages":"132-143"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79538121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STDC-MA Network for Semantic Segmentation","authors":"Xiaochun Lei, Linjun Lu, Zetao Jiang, Zhaoting Gong, Chang Lu, Jiaming Liang","doi":"10.48550/arXiv.2205.04639","DOIUrl":"https://doi.org/10.48550/arXiv.2205.04639","url":null,"abstract":"Semantic segmentation is applied extensively in autonomous driving and intelligent transportation with methods that highly demand spatial and semantic information. Here, an STDC-MA network is proposed to meet these demands. First, the STDC-Seg structure is employed in STDC-MA to ensure a lightweight and efficient structure. Subsequently, the feature alignment module (FAM) is applied to understand the offset between high-level and low-level features, solving the problem of pixel offset related to upsampling on the high-level feature map. Our approach implements the effective fusion between high-level features and low-level features. A hierarchical multiscale attention mechanism is adopted to reveal the relationship among attention regions from two different input sizes of one image. Through this relationship, regions receiving much attention are integrated into the segmentation results, thereby reducing the unfocused regions of the input image and improving the effective utilization of multiscale features. STDC- MA maintains the segmentation speed as an STDC-Seg network while improving the segmentation accuracy of small objects. STDC-MA was verified on the verification set of Cityscapes. The segmentation result of STDC-MA attained 76.81% mIOU with the input of 0.5x scale, 3.61% higher than STDC-Seg.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"32 1","pages":"3758-3767"},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78055664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-similarity based Hyperrelation Network for few-shot segmentation","authors":"Xian Shi, Zhe Cui, Shaobing Zhang, Miao Cheng, L. He, Xianghong Tang","doi":"10.48550/arXiv.2203.09550","DOIUrl":"https://doi.org/10.48550/arXiv.2203.09550","url":null,"abstract":"Few-shot semantic segmentation aims at recognizing the object regions of unseen categories with only a few annotated examples as supervision. The key to few-shot segmentation is to establish a robust semantic relationship between the support and query images and to prevent overfitting. In this paper, we propose an effective Multi-similarity Hyperrelation Network (MSHNet) to tackle the few-shot semantic segmentation problem. In MSHNet, we propose a new Generative Prototype Similarity (GPS), which together with cosine similarity can establish a strong semantic relation between the support and query images. The locally generated prototype similarity based on global feature is logically complementary to the global cosine similarity based on local feature, and the relationship between the query image and the supported image can be expressed more comprehensively by using the two similarities simultaneously. In addition, we propose a Symmetric Merging Block (SMB) in MSHNet to efficiently merge multi-layer, multi-shot and multi-similarity hyperrelational features. MSHNet is built on the basis of similarity rather than specific category features, which can achieve more general unity and effectively reduce overfitting. On two benchmark semantic segmentation datasets Pascal-5i and COCO-20i, MSHNet achieves new state-of-the-art performances on 1-shot and 5-shot semantic segmentation tasks.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"9 1","pages":"204-214"},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90647288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sulong Ge, Zhihua Xia, Yao Tong, Jian Weng, Jia-Nan Liu
{"title":"A Screen-Shooting Resilient Document Image Watermarking Scheme using Deep Neural Network","authors":"Sulong Ge, Zhihua Xia, Yao Tong, Jian Weng, Jia-Nan Liu","doi":"10.48550/arXiv.2203.05198","DOIUrl":"https://doi.org/10.48550/arXiv.2203.05198","url":null,"abstract":"With the advent of the screen-reading era, the confidential documents displayed on the screen can be easily captured by a camera without leaving any traces. Thus, this paper proposes a novel screen-shooting resilient watermarking scheme for document image using deep neural network. By applying this scheme, when the watermarked image is displayed on the screen and captured by a camera, the watermark can be still extracted from the captured photographs. Specifically, our scheme is an end-to-end neural network with an encoder to embed watermark and a decoder to extract watermark. During the training process, a distortion layer between encoder and decoder is added to simulate the distortions introduced by screen-shooting process in real scenes, such as camera distortion, shooting distortion, light source distortion. Besides, an embedding strength adjustment strategy is designed to improve the visual quality of the watermarked image with little loss of extraction accuracy. The experimental results show that the scheme has higher robustness and visual quality than other three recent state-of-the-arts. Specially, even if the shooting distances and angles are in extreme, our scheme can also obtain high extraction accuracy.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"312 1","pages":"323-336"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75817191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geodesic Gramian Denoising Applied to the Images Contaminated With Noise Sampled From Diverse Probability Distributions","authors":"Yonggi Park, K. Gajamannage, Alexey L. Sadovski","doi":"10.48550/arXiv.2203.02600","DOIUrl":"https://doi.org/10.48550/arXiv.2203.02600","url":null,"abstract":"As quotidian use of sophisticated cameras surges, people in modern society are more interested in capturing fine-quality images. However, the quality of the images might be inferior to people's expectations due to the noise contamination in the images. Thus, filtering out the noise while preserving vital image features is an essential requirement. Current existing denoising methods have their own assumptions on the probability distribution in which the contaminated noise is sampled for the method to attain its expected denoising performance. In this paper, we utilize our recent Gramian-based filtering scheme to remove noise sampled from five prominent probability distributions from selected images. This method preserves image smoothness by adopting patches partitioned from the image, rather than pixels, and retains vital image features by performing denoising on the manifold underlying the patch space rather than in the image domain. We validate its denoising performance, using three benchmark computer vision test images applied to two state-of-the-art denoising methods, namely BM3D and K-SVD.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"122 6 1","pages":"144-156"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80212077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectral recovery-guided hyperspectral super-resolution using transfer learning","authors":"Shaolei Zhang, Guangyuan Fu, Hongqiao Wang, Yuqing Zhao","doi":"10.1049/IPR2.12253","DOIUrl":"https://doi.org/10.1049/IPR2.12253","url":null,"abstract":"Single hyperspectral image (HSI) super-resolution (SR) has attracted researcher’s attention; however, most existing methods directly model the mapping between low- and high-resolution images from an external training dataset, which requires large memory and com-puting resources. Moreover, there are few such available training datasets in real cases, which prevent deep-learning-based methods from further improving performance. Here, a novel single HSI SR method based on transfer learning is proposed. The proposed method is composed of two stages: spectral down-sampled image SR reconstruction based on transfer learning and HSI reconstruction via spectral recovery module. Instead of directly applying the learned knowledge from the colour image domain to HSI SR, the spectral down-sampled image is fed into a spatial SR model to obtain a high-resolution image, which acts as a bridge between the colour image and HSI. The spectral recovery network is used to restore the HSI from the bridge image. In addition, pre-training and collaborative fine-tuning are proposed to promote the performance of SR and spectral recovery. Experiments on two public HSI datasets show that the proposed method achieves promising SR performance with a small paired HSI dataset.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"30 1","pages":"2656-2665"},"PeriodicalIF":0.0,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74797409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Texture and exposure awareness based refill for HDRI reconstruction of saturated and occluded areas","authors":"Jianming Zhou, Yipeng Deng, Qin Liu, T. Ikenaga","doi":"10.1049/IPR2.12257","DOIUrl":"https://doi.org/10.1049/IPR2.12257","url":null,"abstract":"High-dynamic-range image (HDRI) displays scenes as vivid as the real scenes. HDRI can be reconstructed by fusing a set of bracketed-exposure low-dynamic-range images (LDRI). For the reconstruction, many works succeed in removing the ghost artefacts caused by moving objects. The critical issue is reconstructing the areas which are saturated due to bad exposure and occluded due to motion with no ghost artefacts. To overcome this issue, this paper proposes texture and exposure awareness based refill. The proposed work first locates the saturated and occluded areas existing in input image set, then refills background textures or patches containing rough exposure and colour information into located areas. Proposed work can be integrated with multiple existing ghost removal works to improve the reconstruction result. Experimental results show that proposed work removes the ghost artefacts caused by saturated and occluded areas in subjective evaluation. For the objective evaluation, the proposed work improves the HDR-VDP-2 evaluation result for multiple conventional works by 1.33% on average.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"32 7","pages":"2705-2716"},"PeriodicalIF":0.0,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91407639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved algorithm using weighted guided coefficient and union self-adaptive image enhancement for single image haze removal","authors":"Guangbin Zhou, Lifeng He, Yong Qi, Meimei Yang, Xiao Zhao, Y. Chao","doi":"10.1049/IPR2.12255","DOIUrl":"https://doi.org/10.1049/IPR2.12255","url":null,"abstract":"The visibility of outdoor images is usually significantly degraded by haze. Existing dehazing algorithms, such as dark channel prior (DCP) and colour attenuation prior (CAP), have made great progress and are highly effective. However, they all suffer from the problems of dark distortion and detailed information loss. This paper proposes an improved algorithm for single-image haze removal based on dark channel prior with weighted guided coefficient and union self-adaptive image enhancement. First, a weighted guided coefficient method with sampling based on guided image filtering is proposed to refine the transmission map efficiently. Second, the k -means clustering method is adopted to calibrate the original image into bright and non-bright colour areas and form a transmission constraint matrix. The constraint matrix is then marked by connected-component labelling, and small bright regions are eliminated to form an atmospheric light constraint matrix, which can suppress the halo effect and optimize the atmospheric light. Finally, an adaptive linear contrast enhancement algorithm with a union score is proposed to optimize restored images. Experimental results demonstrate that the proposed algorithm can overcome the problems of image distortion and detailed information loss and is more efficient than conventional dehazing algorithms.","PeriodicalId":13486,"journal":{"name":"IET Image Process.","volume":"94 1","pages":"2680-2692"},"PeriodicalIF":0.0,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87041728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}