Qi Zheng , Zhengzhong Tu , Pavan C. Madhusudana , Xiaoyang Zeng , Alan C. Bovik , Yibo Fan
{"title":"FAVER: Blind quality prediction of variable frame rate videos","authors":"Qi Zheng , Zhengzhong Tu , Pavan C. Madhusudana , Xiaoyang Zeng , Alan C. Bovik , Yibo Fan","doi":"10.1016/j.image.2024.117101","DOIUrl":"10.1016/j.image.2024.117101","url":null,"abstract":"<div><p><span><span>Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales. Recent advances in mobile devices<span><span> and cloud computing techniques have made it possible to capture, process, and share high resolution, high frame rate (HFR) videos across the Internet nearly instantaneously. Being able to monitor and control the quality of these streamed videos can enable the delivery of more enjoyable content and perceptually optimized rate control. Accordingly, there is a pressing need to develop VQA models that can be deployed at enormous scales. While some recent effects have been applied to full-reference (FR) analysis of variable frame rate and HFR video quality, the development of no-reference (NR) VQA algorithms targeting frame rate variations has been little studied. Here, we propose a first-of-a-kind blind VQA model for evaluating HFR videos, which we dub the Framerate-Aware Video </span>Evaluator w/o Reference (FAVER). FAVER uses extended models of spatial natural scene statistics that encompass space–time wavelet-decomposed video signals, and leverages the advantages of the </span></span>deep neural network to provide motion perception, to conduct efficient frame rate sensitive quality prediction. Our extensive experiments on several HFR video quality datasets show that FAVER outperforms other blind VQA algorithms at a reasonable computational cost. To facilitate reproducible research and public evaluation, an implementation of FAVER is being made freely available online: </span><span>https://github.com/uniqzheng/HFR-BVQA</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117101"},"PeriodicalIF":3.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Omar Sallam, Rihui Feng, Jack Stason, Xinguo Wang, Mirjam Fürth
{"title":"Stereo vision based systems for sea-state measurement and floating structures monitoring","authors":"Omar Sallam, Rihui Feng, Jack Stason, Xinguo Wang, Mirjam Fürth","doi":"10.1016/j.image.2023.117088","DOIUrl":"10.1016/j.image.2023.117088","url":null,"abstract":"<div><p><span>Using computer vision<span> techniques such as stereo vision systems for sea state measurement or for </span></span>offshore structures<span><span> monitoring can improve the measurement fidelity<span> and accuracy with no significant additional cost. In this paper, two experiments (in-lab/open-sea) are conducted to study the performance of stereo vision system to measure the water wave surface elevation and rigid body heaving motion. For the in-lab experiment, regular water waves are generated in a wave tank for different frequencies and wave heights, where the water surface is scanned by the stereo vision camera installed on the top of the tank. Surface elevation inferred by the stereo vision is verified by an installed stationary side camera that records the water surface through the tank transparent side window, water surface elevation measured by the side camera recordings is extracted using edge detection algorithm. During the in-lab experiment a heaving buoy is installed to test the performance of Visual Simultaneous </span></span>Localization<span> and Mapping (VSLAM) algorithm to monitor the buoy heave motion. The VSLAM algorithm fuses a buoy onboard stereo vision recordings with an embedded Inertial Measurement Unit<span> (IMU) to estimate the 6-DOF of a rigid body. The Buoy motion VSLAM measurements are verified by a KLT tracking algorithm implemented on the video recordings of the stationary side camera. The open-sea experiment is implemented in Lake Somerville, Texas. The stereo vision system is installed to measure the water surface elevation and directional spectrum of the wind generated irregular waves. The open-sea wave measurements by the stereo vision are verified by a Sofar commercial wave buoys deployed in the testing location.</span></span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117088"},"PeriodicalIF":3.5,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139374052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huaping Zhou , Tao Wu , Senmao Ye , Xinru Qin , Kelei Sun
{"title":"Enhancing fine-detail image synthesis from text descriptions by text aggregation and connection fusion module","authors":"Huaping Zhou , Tao Wu , Senmao Ye , Xinru Qin , Kelei Sun","doi":"10.1016/j.image.2023.117099","DOIUrl":"10.1016/j.image.2023.117099","url":null,"abstract":"<div><p><span><span>Synthesizing images with fine details from text descriptions is a challenge. The existing single-stage generative adversarial networks<span> (GANs) fuse sentence features into the image generation process through affine transformation, which alleviate the problems of missing details and large computation from stacked networks. However, existing single-stage networks ignore the word features in the text description, resulting in a lack of detail in the generated image. To address this issue, we proposed a text aggregation module (TAM) to fuse sentence features and word features in a text by a simple spatial </span></span>attention mechanism. Then we built a text connection fusion (TCF) block consisting mainly of gated </span>recurrent<span> unit (GRU) and up-sampled block. It can connect text features used in the up-sampled blocks to improve text utilization. Besides, to further improve the semantic consistency between text and the generated images, we introduce the deep attentional multimodal similarity model (DAMSM) loss, which monitors the similarity between text and improves semantic consistency. Experimental results prove that our method is superior to the state-of-the-art models on the CUB and COCO datasets, regarding both image fidelity and semantic consistency with the text.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117099"},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139093167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nofre Sanmartin-Vich , Javier Calpe , Filiberto Pla
{"title":"Analyzing the effect of shot noise in indirect Time-of-Flight cameras","authors":"Nofre Sanmartin-Vich , Javier Calpe , Filiberto Pla","doi":"10.1016/j.image.2023.117089","DOIUrl":"10.1016/j.image.2023.117089","url":null,"abstract":"<div><p>Continuous wave indirect Time-of-Flight cameras obtain depth images by emitting a modulated continuous light wave and measuring the delay of the received signal. In this paper we generalize the estimation of the effect of the shot noise when obtaining the phase delay with an arbitrary number of points in the Discrete Fourier Transform<span>, extending and generalizing the analysis done in previous works for the case of four points. For that particular case, we compare our analysis with the state of art. Moreover, we extend the error model using a second order approximation in the error propagation analysis, which provides more accurate estimations according to the Montecarlo simulation experiments. The analysis, based on both analytical and numerical methods, shows that the phase error is, in general, related to the exposure time and weakly to the number of points in the Discrete Fourier Transform. It also depends on the background illumination level, on the amplitude of the received signal, and, when using a three point DFT, on the distance to the objects.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117089"},"PeriodicalIF":3.5,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139065281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative analysis of facial soft tissue using weighted cascade regression model applicable for facial plastic surgery","authors":"Ali Fahmi Jafargholkhanloo, Mousa Shamsi","doi":"10.1016/j.image.2023.117086","DOIUrl":"10.1016/j.image.2023.117086","url":null,"abstract":"<div><p>Localization of facial landmarks plays an important role in the measurement of facial metrics applicable for beauty analysis and facial plastic surgery. The first step in detecting facial landmarks is to estimate the face bounding box. Clinical images of patients' faces usually show intensity non-uniformity. These conditions cause common face detection algorithms do not perform well in face detection under varying illumination. To solve this problem, a modified fuzzy c-means (MFCM) algorithm is used under varying illumination modeling. The cascade regression method (CRM) has an appropriate performance in face alignment. This algorithm has two main drawbacks. (1) In the training phase, increasing the real data without considering normal data can lead to over-fitting. To solve this problem, a weighted CRM (WCRM) is presented. (2) In the test phase, using a mean shape causes the initial shape to be either near to or far from the face shape. To overcome this problem, a Procrustes-based analysis is presented. One of the most important steps in facial landmark localization is feature extraction. In this study, to increase detection accuracy of the cephalometric landmarks, local phase quantization (LPQ) is used for feature extraction in all three channels of RGB color space. Finally, the proposed algorithm is used to measure facial anthropometric metrics. Experimental results show that the proposed algorithm has a better performance in facial landmark localization than other compared algorithms.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"121 ","pages":"Article 117086"},"PeriodicalIF":3.5,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138547351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSRNet: Depth Super-Resolution Network guided by blurry depth and clear intensity edges","authors":"Hui Lan, Cheolkon Jung","doi":"10.1016/j.image.2023.117064","DOIUrl":"https://doi.org/10.1016/j.image.2023.117064","url":null,"abstract":"<div><p><span><span>Although high resolution (HR) depth images are required in many applications such as virtual reality and autonomous navigation<span>, their resolution and quality generated by consumer depth cameras fall short of the requirements. Existing depth upsampling methods focus on extracting multiscale features of HR color image to guide low resolution (LR) depth upsampling, thus causing blurry and inaccurate edges in depth. In this paper, we propose a depth super-resolution (SR) network guided by blurry depth and clear intensity edges, called DSRNet. DSRNet differentiates effective edges from a number of HR edges with the guidance of blurry depth and clear intensity edges. First, we perform global residual estimation based on an encoder–decoder architecture to extract edge structure from HR color image for depth SR. Then, we distinguish effective edges from HR edges in the decoder side with the guidance of LR depth upsampling. To maintain edges for depth SR, we use intensity edge guidance that extracts clear intensity edges from HR image. Finally, we use residual loss to generate accurate high frequency (HF) residual and reconstruct HR depth maps. Experimental results show that DSRNet successfully reconstructs depth edges in SR results as well as outperforms the state-of-the-art methods in terms of visual quality and </span></span>quantitative measurements.</span><span><sup>1</sup></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"121 ","pages":"Article 117064"},"PeriodicalIF":3.5,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138490174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A dual fusion deep convolutional network for blind universal image denoising","authors":"Zhiyu Lyu, Yan Chen, Haojun Sun, Yimin Hou","doi":"10.1016/j.image.2023.117077","DOIUrl":"https://doi.org/10.1016/j.image.2023.117077","url":null,"abstract":"<div><p><span>Blind image denoising and edge-preserving are two primary challenges to recover an image from low-level vision to high-level vision. Blind denoising requires a single denoiser can denoise images with any intensity of noise, and it has practical utility since accurate noise levels cannot be acquired from realistic images. On the other hand, </span>edge preservation<span><span> can provide more image features for subsequent processing which is also important for the denoising. In this paper, we propose a novel blind universal image denoiser to remove synthesis and realistic noise while preserving the image texture. The denoiser consists of noise network and prior network parallelly, and then a fusion block is used to give the weight between these two networks to balance computation cost and denoising performance. We also use the Non-subsampled Shearlet Transform (NSST) to enlarge the size of receptive field to obtain more detailed information. Extensive denoising experiments on </span>synthetic images and realistic images show the effectiveness of our denoiser.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117077"},"PeriodicalIF":3.5,"publicationDate":"2023-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134656277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vivek Sharma , Ashish Kumar Tripathi , Purva Daga , Nidhi M. , Himanshu Mittal
{"title":"ClGanNet: A novel method for maize leaf disease identification using ClGan and deep CNN","authors":"Vivek Sharma , Ashish Kumar Tripathi , Purva Daga , Nidhi M. , Himanshu Mittal","doi":"10.1016/j.image.2023.117074","DOIUrl":"https://doi.org/10.1016/j.image.2023.117074","url":null,"abstract":"<div><p>With the advancement of technologies, automatic plant leaf disease detection has received considerable attention from researchers working in the area of precision agriculture. A number of deep learning-based methods have been introduced in the literature for automated plant disease detection. However, the majority of datasets collected from real fields have blurred background information, data imbalances, less generalization, and tiny lesion features, which may lead to over-fitting of the model. Moreover, the increased parameter size of deep learning models is also a concern, especially for agricultural applications due to limited resources. In this paper, a novel ClGan (Crop Leaf Gan) with improved loss function has been developed with a reduced number of parameters as compared to the existing state-of-the-art methods. The generator and discriminator of the developed ClGan have been encompassed with an encoder–decoder network to avoid the vanishing gradient problem, training instability, and non-convergence failure while preserving complex intricacies during synthetic image generation with significant lesion differentiation. The proposed improved loss function introduces a dynamic correction factor that stabilizes learning while perpetuating effective weight optimization. In addition, a novel plant leaf classification method ClGanNet, has been introduced to classify plant diseases efficiently. The efficiency of the proposed ClGan was validated on the maize leaf dataset in terms of the number of parameters and FID score, and the results are compared against five other state-of-the-art GAN models namely, DC-GAN, W-GAN, <span><math><mrow><mi>W</mi><mi>G</mi><mi>a</mi><msub><mrow><mi>n</mi></mrow><mrow><mi>G</mi><mi>P</mi></mrow></msub></mrow></math></span>, InfoGan, and LeafGan. Moreover, the performance of the proposed classifier, ClGanNet, was evaluated with seven state-of-the-art methods against eight parameters on the original, basic augmented, and ClGan augmented datasets. Experimental results of ClGanNet have outperformed all the considered methods with 99.97% training and 99.04% testing accuracy while using the least number of parameters.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117074"},"PeriodicalIF":3.5,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueyu Han , Ishtiaq Rasool Khan , Susanto Rahardja
{"title":"Image tone mapping based on clustering and human visual system models","authors":"Xueyu Han , Ishtiaq Rasool Khan , Susanto Rahardja","doi":"10.1016/j.image.2023.117075","DOIUrl":"10.1016/j.image.2023.117075","url":null,"abstract":"<div><p><span><span>Natural scenes generally have very high dynamic range (HDR) which cannot be captured in the standard dynamic range (SDR) images. HDR imaging techniques can be used to capture these details in both dark and bright regions, and the resultant HDR images can be tone mapped to reproduce them on SDR displays. To adapt to different applications, the tone mapping operator (TMO) should be able to achieve high performance for diverse HDR scenes. In this paper, we present a clustering-based TMO by embedding </span>human visual system models that function effectively in different scenes. A hierarchical scheme is applied for clustering to reduce the </span>computational complexity<span>. We also propose a detail preservation method by superimposing the details of original HDR images to enhance local contrasts, and a color preservation method by limiting the adaptive saturation parameter to control the color saturation attenuating. The effectiveness of our method is assessed by comparing with state-of-the-art TMOs quantitatively on large-scale HDR datasets and qualitatively with a group of subjects. Experimental results of both objective and subjective evaluations show that the proposed method achieves improvements over the competing methods in generating high quality tone-mapped images with good contrast and natural color appearance for diverse HDR scenes.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117075"},"PeriodicalIF":3.5,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136093478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Individual tooth segmentation in human teeth images using pseudo edge-region obtained by deep neural networks","authors":"Seongeun Kim, Chang-Ock Lee","doi":"10.1016/j.image.2023.117076","DOIUrl":"https://doi.org/10.1016/j.image.2023.117076","url":null,"abstract":"<div><p><span><span>In human teeth images taken outside the oral cavity with a general optical camera, it is difficult to segment individual tooth due to common obstacles such as weak edges, intensity inhomogeneities and strong light reflections. In this work, we propose a method for segmenting individual tooth in human teeth images. The key to this method is to obtain pseudo edge-region using </span>deep neural networks. After an additional step to obtain </span>initial contours<span><span> for each tooth region, the individual tooth is segmented by applying active contour models. We also present a strategy using existing model-based methods for labeling the data required for </span>neural network training.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117076"},"PeriodicalIF":3.5,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}