Vivek Sharma , Ashish Kumar Tripathi , Purva Daga , Nidhi M. , Himanshu Mittal
{"title":"ClGanNet: A novel method for maize leaf disease identification using ClGan and deep CNN","authors":"Vivek Sharma , Ashish Kumar Tripathi , Purva Daga , Nidhi M. , Himanshu Mittal","doi":"10.1016/j.image.2023.117074","DOIUrl":"https://doi.org/10.1016/j.image.2023.117074","url":null,"abstract":"<div><p>With the advancement of technologies, automatic plant leaf disease detection has received considerable attention from researchers working in the area of precision agriculture. A number of deep learning-based methods have been introduced in the literature for automated plant disease detection. However, the majority of datasets collected from real fields have blurred background information, data imbalances, less generalization, and tiny lesion features, which may lead to over-fitting of the model. Moreover, the increased parameter size of deep learning models is also a concern, especially for agricultural applications due to limited resources. In this paper, a novel ClGan (Crop Leaf Gan) with improved loss function has been developed with a reduced number of parameters as compared to the existing state-of-the-art methods. The generator and discriminator of the developed ClGan have been encompassed with an encoder–decoder network to avoid the vanishing gradient problem, training instability, and non-convergence failure while preserving complex intricacies during synthetic image generation with significant lesion differentiation. The proposed improved loss function introduces a dynamic correction factor that stabilizes learning while perpetuating effective weight optimization. In addition, a novel plant leaf classification method ClGanNet, has been introduced to classify plant diseases efficiently. The efficiency of the proposed ClGan was validated on the maize leaf dataset in terms of the number of parameters and FID score, and the results are compared against five other state-of-the-art GAN models namely, DC-GAN, W-GAN, <span><math><mrow><mi>W</mi><mi>G</mi><mi>a</mi><msub><mrow><mi>n</mi></mrow><mrow><mi>G</mi><mi>P</mi></mrow></msub></mrow></math></span>, InfoGan, and LeafGan. Moreover, the performance of the proposed classifier, ClGanNet, was evaluated with seven state-of-the-art methods against eight parameters on the original, basic augmented, and ClGan augmented datasets. Experimental results of ClGanNet have outperformed all the considered methods with 99.97% training and 99.04% testing accuracy while using the least number of parameters.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117074"},"PeriodicalIF":3.5,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueyu Han , Ishtiaq Rasool Khan , Susanto Rahardja
{"title":"Image tone mapping based on clustering and human visual system models","authors":"Xueyu Han , Ishtiaq Rasool Khan , Susanto Rahardja","doi":"10.1016/j.image.2023.117075","DOIUrl":"10.1016/j.image.2023.117075","url":null,"abstract":"<div><p><span><span>Natural scenes generally have very high dynamic range (HDR) which cannot be captured in the standard dynamic range (SDR) images. HDR imaging techniques can be used to capture these details in both dark and bright regions, and the resultant HDR images can be tone mapped to reproduce them on SDR displays. To adapt to different applications, the tone mapping operator (TMO) should be able to achieve high performance for diverse HDR scenes. In this paper, we present a clustering-based TMO by embedding </span>human visual system models that function effectively in different scenes. A hierarchical scheme is applied for clustering to reduce the </span>computational complexity<span>. We also propose a detail preservation method by superimposing the details of original HDR images to enhance local contrasts, and a color preservation method by limiting the adaptive saturation parameter to control the color saturation attenuating. The effectiveness of our method is assessed by comparing with state-of-the-art TMOs quantitatively on large-scale HDR datasets and qualitatively with a group of subjects. Experimental results of both objective and subjective evaluations show that the proposed method achieves improvements over the competing methods in generating high quality tone-mapped images with good contrast and natural color appearance for diverse HDR scenes.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117075"},"PeriodicalIF":3.5,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136093478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Individual tooth segmentation in human teeth images using pseudo edge-region obtained by deep neural networks","authors":"Seongeun Kim, Chang-Ock Lee","doi":"10.1016/j.image.2023.117076","DOIUrl":"https://doi.org/10.1016/j.image.2023.117076","url":null,"abstract":"<div><p><span><span>In human teeth images taken outside the oral cavity with a general optical camera, it is difficult to segment individual tooth due to common obstacles such as weak edges, intensity inhomogeneities and strong light reflections. In this work, we propose a method for segmenting individual tooth in human teeth images. The key to this method is to obtain pseudo edge-region using </span>deep neural networks. After an additional step to obtain </span>initial contours<span><span> for each tooth region, the individual tooth is segmented by applying active contour models. We also present a strategy using existing model-based methods for labeling the data required for </span>neural network training.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117076"},"PeriodicalIF":3.5,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are metrics measuring what they should? An evaluation of Image Captioning task metrics","authors":"Othón González-Chávez , Guillermo Ruiz , Daniela Moctezuma , Tania Ramirez-delReal","doi":"10.1016/j.image.2023.117071","DOIUrl":"https://doi.org/10.1016/j.image.2023.117071","url":null,"abstract":"<div><p><span>Image Captioning is a current research task to describe the image content using the objects and their relationships in the scene. Two important research areas converge to tackle this task: artificial vision and natural language processing. In Image Captioning, as in any computational intelligence task, the performance metrics are crucial for knowing how well (or bad) a method performs. In recent years, it has been observed that classical metrics based on </span><span><math><mi>n</mi></math></span>-grams are insufficient to capture the semantics and the critical meaning to describe the content in an image. Looking to measure how well or not the current and more recent metrics are doing, in this article, we present an evaluation of several kinds of Image Captioning metrics and a comparison between them using the well-known datasets, MS-COCO and Flickr8k. The metrics were selected from the most used in prior works; they are those based on <span><math><mi>n</mi></math></span>-grams, such as BLEU, SacreBLEU, METEOR, ROGUE-L, CIDEr, SPICE, and those based on embeddings, such as BERTScore and CLIPScore. We designed two scenarios for this: (1) a set of artificially built captions with several qualities and (2) a comparison of some state-of-the-art Image Captioning methods. Interesting findings were found trying to answer the questions: Are the current metrics helping to produce high-quality captions? How do actual metrics compare to each other? What are the metrics <em>really</em> measuring?</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117071"},"PeriodicalIF":3.5,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49833433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A transformer-based network for perceptual contrastive underwater image enhancement","authors":"Na Cheng, Zhixuan Sun, Xuanbing Zhu, Hongyu Wang","doi":"10.1016/j.image.2023.117032","DOIUrl":"https://doi.org/10.1016/j.image.2023.117032","url":null,"abstract":"<div><p>Vision-based underwater image enhancement methods have received much attention for application in the fields of marine engineering and marine science. The absorption and scattering of light in real underwater scenes leads to severe information degradation in the acquired underwater images, thus limiting further development of underwater tasks. To solve these problems, a novel transformer-based perceptual contrastive network for underwater image enhancement methods (TPC-UIE) is proposed to achieve visually friendly and high-quality images, where contrastive learning<span> is applied to the underwater image enhancement (UIE) task for the first time. Specifically, to address the limitations of the pure convolution-based network, we embed the transformer into the UIE network to improve its ability to capture global dependencies. Then, the limits of the transformer are then taken into account as convolution is reintroduced to better capture local attention. At the same time, the dual-attention module strengthens the network’s focus on the spatial and color channels that are more severely attenuated. Finally, a perceptual contrastive regularization method is proposed, where a multi-loss function made up of reconstruction loss, perceptual loss, and contrastive loss jointly optimizes the model to simultaneously ensure texture detail, contrast, and color consistency. Experimental results on several existing datasets show that the TPC-UIE obtains excellent performance in both subjective and objective evaluations compared to other methods. In addition, the visual quality of the underwater images is significantly improved by the enhancement of the method and effectively facilitates further development of the underwater task.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117032"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Chen , Shiyun Li , Li Lin , Jiaze Wan , Zuoyong Li
{"title":"No-reference blurred image quality assessment method based on structure of structure features","authors":"Jian Chen , Shiyun Li , Li Lin , Jiaze Wan , Zuoyong Li","doi":"10.1016/j.image.2023.117008","DOIUrl":"https://doi.org/10.1016/j.image.2023.117008","url":null,"abstract":"<div><p><span><span><span><span><span><span>The deep structure in the image contains certain information of the image, which is helpful to perceive the quality of the image. Inspired by deep level image features extracted via </span>deep learning<span> methods, we propose a no-reference blurred image quality evaluation model based on the structure of structure features. In spatial domain, the novel weighted local binary patterns are proposed which leverage maximum local variation maps to extract structural features from multi-resolution images. In </span></span>spectral domain, </span>gradient information<span> of multi-scale Log-Gabor filtered images is extracted as the structure of structure features, and combined with entropy features. Then, the features extracted from both domains are fused to form a quality perception feature vector and mapped into the quality score via support vector regression (SVR). Experiments are conducted to evaluate the performance of the proposed method on various </span></span>IQA databases, including the LIVE, CSIQ, TID2008, TID2013, CID2013, CLIVE, and BID. The experimental results show that compared with some state-of-the-art methods, our proposed method achieves better evaluation results and is more in line with the </span>human visual system<span>. The source code will be released at </span></span><span>https://github.com/JamesC0321/s2s_features/</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117008"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Magnifying multimodal forgery clues for Deepfake detection","authors":"Xiaolong Liu, Yang Yu, Xiaolong Li, Yao Zhao","doi":"10.1016/j.image.2023.117010","DOIUrl":"https://doi.org/10.1016/j.image.2023.117010","url":null,"abstract":"<div><p><span>Advancements in computer vision<span><span> and deep learning have led to difficulty in distinguishing the generated Deepfake media. In addition, recent forgery techniques also modify the audio information based on the forged video, which brings new challenges. However, due to the cross-modal bias, recent multimodal detection methods do not well explore the intra-modal and cross-modal forgery clues, which leads to limited detection performance. In this paper, we propose a novel audio-visual aware multimodal Deepfake detection framework to magnify intra-modal and cross-modal forgery clues. Firstly, to capture temporal intra-modal defects, Forgery Clues Magnification Transformer (FCMT) module is proposed to magnify forgery clues based on sequence-level relationships. Then, the Distribution Difference based Inconsistency Computing (DDIC) module based on Jensen–Shannon divergence is designed to adaptively align </span>multimodal information for further magnifying the cross-modal inconsistency. Next, we further explore spatial artifacts by connecting multi-scale feature representation to provide comprehensive information. Finally, a </span></span>feature fusion<span> module is designed to adaptively fuse features to generate a more discriminative feature. Experiments demonstrate that the proposed framework outperforms independently trained models, and at the same time, yields superior generalization capability on unseen types of Deepfake.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117010"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofeng Wang , Jun Yu , Zhiheng Sun , Jiameng Sun , Yingying Su
{"title":"Multi-scale graph neural network for global stereo matching","authors":"Xiaofeng Wang , Jun Yu , Zhiheng Sun , Jiameng Sun , Yingying Su","doi":"10.1016/j.image.2023.117026","DOIUrl":"https://doi.org/10.1016/j.image.2023.117026","url":null,"abstract":"<div><p>Currently, deep learning-based stereo matching<span><span> is solely based on local convolution networks, which lack enough global information for accurate disparity estimation. Motivated by the excellent global representation of the graph, a novel Multi-scale </span>Graph Neural Network<span><span> (MGNN) is proposed to essentially improve stereo matching from the global aspect. Firstly, we construct the multi-scale graph structure, where the multi-scale nodes with projected multi-scale image features<span> can be directly linked by the inner-scale and cross-scale edges, instead of solely relying on local convolutions for deep learning-based stereo matching. To enhance the spatial position information at non-Euclidean multi-scale graph space, we further propose a multi-scale </span></span>position embedding to embed the potential position features of Euclidean space into projected multi-scale image features. Secondly, we propose the multi-scale graph feature inference to extract global context information on multi-scale graph structure. Thus, the features not only be globally inferred on each scale, but also can be interactively inferred across different scales to comprehensively consider global context information with multi-scale receptive fields. Finally, MGNN is deployed into dense stereo matching and experiments demonstrate that our method achieves state-of-the-art performance on Scene Flow, KITTI 2012/2015, and Middlebury Stereo Evaluation v.3/2021.</span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117026"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing transferability of adversarial examples with pixel-level scale variation","authors":"Zhongshu Mao , Yiqin Lu , Zhe Cheng , Xiong Shen","doi":"10.1016/j.image.2023.117020","DOIUrl":"https://doi.org/10.1016/j.image.2023.117020","url":null,"abstract":"<div><p>The transferability of adversarial examples under the black-box attack setting has attracted extensive attention from the community. Input transformation is one of the most effective approaches to improve the transferability among all methods proposed recently. However, existing methods either only slightly improve transferability or are not robust to defense models. We delve into the generation process of adversarial examples and find that existing input transformation methods tend to craft adversarial examples by transforming the entire image, which we term image-level transformations. This naturally motivates us to perform pixel-level transformations, i.e., transforming only part pixels of the image. Experimental results show that pixel-level transformations can considerably enhance the transferability of the adversarial examples while still being robust to defense models. We believe that pixel-level transformations are more fine-grained than image-level transformations, and thus can achieve better performance. Based on this finding, we propose the pixel-level scale variation (PSV) method to further improve the transferability of adversarial examples. The proposed PSV randomly samples a set of scaled mask matrices and transforms the part pixels of the input image with the matrices to increase the pixel-level diversity. Empirical evaluations on the standard ImageNet dataset demonstrate the effectiveness and superior performance of the proposed PSV both on the normally trained (with the highest average attack success rate of 79.2%) and defense models (with the highest average attack success rate of 61.4%). Our method can further improve transferability (with the highest average attack success rate of 88.2%) by combining it with other input transformation methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117020"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zefang Han , Hong Shangguan, Xiong Zhang, Xueying Cui, Yue Wang
{"title":"A coarse-to-fine multi-scale feature hybrid low-dose CT denoising network","authors":"Zefang Han , Hong Shangguan, Xiong Zhang, Xueying Cui, Yue Wang","doi":"10.1016/j.image.2023.117009","DOIUrl":"https://doi.org/10.1016/j.image.2023.117009","url":null,"abstract":"<div><p><span><span>With the growing development and wide clinical application of CT technology, the potential radiation damage to patients has sparked public concern. However, reducing the radiation dose may cause large amounts of noise and artifacts in the reconstructed images, which may affect the accuracy of the clinical diagnosis. Therefore, improving the quality of low-dose CT scans has become a popular research topic. Generative adversarial networks (GAN) have provided new research ideas for low-dose CT (LDCT) denoising. However, utilizing only image decomposition or adding new functional </span>subnetworks<span> cannot effectively fuse the same type of features with different scales (or different types of features). Thus, most current GAN-based denoising networks often suffer from low feature utilization and increased network complexity. To address these problems, we propose a coarse-to-fine multiscale feature hybrid low-dose CT denoising network (CMFHGAN). The generator consists of a global denoising module, local texture feature enhancement module, and self-calibration </span></span>feature fusion<span> module. The three modules complement each other and guarantee overall denoising performance. In addition, to further improve the denoising performance, we propose a multi-resolution inception discriminator with multiscale feature extraction ability. Experiments were performed on the Mayo and Piglet datasets, and the results showed that the proposed method outperformed the state-of-the-art denoising algorithms.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117009"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49845014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}