Joshua P. Ebenezer , Zaixi Shang , Yongjun Wu , Hai Wei , Sriram Sethuraman , Alan C. Bovik
{"title":"HDR-ChipQA: No-reference quality assessment on High Dynamic Range videos","authors":"Joshua P. Ebenezer , Zaixi Shang , Yongjun Wu , Hai Wei , Sriram Sethuraman , Alan C. Bovik","doi":"10.1016/j.image.2024.117191","DOIUrl":"10.1016/j.image.2024.117191","url":null,"abstract":"<div><p>We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (HDR) videos, which we call HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in massively scaled video networks has driven the need for video quality assessment (VQA) algorithms that better account for distortions on HDR content. In particular, standard VQA models may fail to capture conspicuous distortions at the extreme ends of the dynamic range, because the features that drive them may be dominated by distortions that pervade the mid-ranges of the signal. We introduce a new approach whereby a local expansive nonlinearity emphasizes distortions occurring at the higher and lower ends of the local luma range, allowing for the definition of additional quality-aware features that are computed along a separate path. These features are not HDR-specific, and also improve VQA on SDR video contents, albeit to a reduced degree. We show that this preprocessing step significantly boosts the power of distortion-sensitive natural video statistics (NVS) features when used to predict the quality of HDR content. In similar manner, we separately compute novel wide-gamut color features using the same nonlinear processing steps. We have found that our model significantly outperforms SDR VQA algorithms on the only publicly available, comprehensive HDR database, while also attaining state-of-the-art performance on SDR content.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117191"},"PeriodicalIF":3.4,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A virtual-reality spatial matching algorithm and its application on equipment maintenance support: System design and user study","authors":"Xiao Yang , Fanghao Huang , Jiacheng Jiang , Zheng Chen","doi":"10.1016/j.image.2024.117188","DOIUrl":"10.1016/j.image.2024.117188","url":null,"abstract":"<div><p>Equipment maintenance support is an important technical measure to maintain the equipment’s expected performance. However, the current maintenance supports are mainly completed by maintainers under the guidance of technical manual or additional experts, which may be insufficient for some advanced equipment with rapid update rate and complex inner structure. The rising technology of augmented reality (AR) provides a new solution for equipment maintenance support, while one of the key issues limiting the practical application of AR in maintenance field is the spatial matching issue between virtual space and reality space. In this paper, a virtual-reality spatial matching algorithm is designed to accurately superimpose the virtual information to the corresponding actual scene on the AR glasses. In this algorithm, two methods are proposed to help achieve the stable matching of virtual space and reality space. In detail, to obtain the saliency map with less background interference and improved saliency detection accuracy, a saliency detection method is designed based on the super-pixel segmentation. To deal with the problems of uneven distribution on the feature points and weak robustness to the light changes, a feature extraction and matching method is proposed for acquiring the feature point matching set with the utilization of the obtained saliency map. Finally, an immersive equipment maintenance support system (IEMSS) is developed based on this spatial matching algorithm, which provides the maintainers with immediate and immersive guidance to improve the efficiency and safety in the maintenance task, as well as offers maintenance training for inexperienced maintainers with expanded virtual information in case of limited experts. Several comparative experiments are implemented to verify the effectiveness of proposed methods, and a user study of real system application is carried out to further evaluate the superiority of these methods when applied in the IEMSS.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117188"},"PeriodicalIF":3.4,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141998380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A ‘deep’ review of video super-resolution","authors":"Subhadra Gopalakrishnan, Anustup Choudhury","doi":"10.1016/j.image.2024.117175","DOIUrl":"10.1016/j.image.2024.117175","url":null,"abstract":"<div><p>Video super-resolution (VSR) is an ill-posed inverse problem where the goal is to obtain high-resolution video content from a low-resolution counterpart. In this survey, we trace the history of video super-resolution techniques beginning with traditional methods, showing the evolution towards techniques that use shallow networks and finally, the recent trends where deep learning algorithms result in state-of-the-art performance. Specifically, we consider 60 neural network-based VSR techniques in addition to 8 traditional VSR techniques. We extensively cover the deep learning-based techniques including the latest models and introduce a novel taxonomy depending on their architecture. We discuss the pros and cons of each category of techniques. We consider the various components of the problem including the choice of loss functions, evaluation criteria and the benchmark datasets used for evaluation. We present a comparison of the existing techniques using common datasets, providing insights into the relative rankings of these methods. We compare the network architectures based on their computation speed and the network complexity. We also discuss the limitations of existing loss functions and the evaluation criteria that are currently used and propose alternate suggestions. Finally, we identify some of the current challenges and provide future research directions towards video super-resolution, thus providing a comprehensive understanding of the problem.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117175"},"PeriodicalIF":3.4,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141852723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiling Chen , Fengquan Lan , Hongan Wei , Tiesong Zhao , Wei Liu , Yiwen Xu
{"title":"A comprehensive review of quality of experience for emerging video services","authors":"Weiling Chen , Fengquan Lan , Hongan Wei , Tiesong Zhao , Wei Liu , Yiwen Xu","doi":"10.1016/j.image.2024.117176","DOIUrl":"10.1016/j.image.2024.117176","url":null,"abstract":"<div><p>The recent advances in multimedia technology have significantly expanded the range of audio–visual applications. The continuous enhancement of display quality has led to the emergence of new attributes in video, such as enhanced visual immersion and widespread availability. Within media content, the video signals are presented in various formats including stereoscopic/3D, panoramic/360°and holographic images. The signals are also combined with other sensory elements, such as audio, tactile, and olfactory cues, creating a comprehensive multi-sensory experience for the user. The development of both qualitative and quantitative Quality of Experience (QoE) metrics is crucial for enhancing the subjective experience in immersive scenarios, providing valuable guidelines for system enhancement. In this paper, we review the most recent achievements in QoE assessment for immersive scenarios, summarize the current challenges related to QoE issues, and present outlooks of QoE applications in these scenarios. The aim of our overview is to offer a valuable reference for researchers in the domain of multimedia delivery.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117176"},"PeriodicalIF":3.4,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Heng Chen , Christos G. Bampis , Zhi Li , Joel Sole , Chao Chen , Alan C. Bovik
{"title":"Learned fractional downsampling network for adaptive video streaming","authors":"Li-Heng Chen , Christos G. Bampis , Zhi Li , Joel Sole , Chao Chen , Alan C. Bovik","doi":"10.1016/j.image.2024.117172","DOIUrl":"10.1016/j.image.2024.117172","url":null,"abstract":"<div><p>Given increasing demand for very large format contents and displays, spatial resolution changes have become an important part of video streaming. In particular, video downscaling is a key ingredient that streaming providers implement in their encoding pipeline as part of video quality optimization workflows. Here, we propose a downsampling network architecture that progressively reconstructs residuals at different scales. Since the layers of convolutional neural networks (CNNs) can only be used to alter the resolutions of their inputs by integer scale factors, we seek new ways to achieve fractional scaling, which is crucial in many video processing applications. More concretely, we utilize an alternative building block, formulated as a conventional convolutional layer followed by a differentiable resizer. To validate the efficacy of our proposed downsampling network, we integrated it into a modern video encoding system for adaptive streaming. We extensively evaluated our method using a variety of different video codecs and upsampling algorithms to show its generality. The experimental results show that improvements in coding efficiency over the conventional Lanczos algorithm and state-of-the-art methods are attained, in terms of PSNR, SSIM, and VMAF, when tested on high-resolution test videos. In addition to quantitative experiments, we also carried out a subjective quality study, validating that the proposed downsampling model yields favorable results.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117172"},"PeriodicalIF":3.4,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A modified hue and range preserving color assignment function with a component-wise saturation adjustment for color image enhancement","authors":"Sepideh Khormaeipour, Fatemeh Shakeri","doi":"10.1016/j.image.2024.117174","DOIUrl":"10.1016/j.image.2024.117174","url":null,"abstract":"<div><p>This paper presents a new approach to enhancing color images by modifying an affine color assignment function. This function maps colors to pixels of the enhanced gray-scale image in a way that improves the visual quality of the image, particularly in darker regions. The main goal of our method is to finely adjust saturation, correct saturation loss in specific image regions, and preserve the original image’s range and hue. Our proposed method follows a two-step process. First, it enhances the intensity image using a combination of global and local histogram equalization methods. This results in an overall improved appearance by redistributing pixel intensities and enhancing contrast. Then, modified color mapping functions are applied to assign colors to each pixel of the enhanced gray-scale image. The aim is to adjust saturation by amplifying the maximally saturated color image. Additionally, we introduce two new color-weighted maps to evaluate pixel importance from the maximally saturated image. This contributes to saturation control in the final enhanced image. Compared to alternative color mapping algorithms, our model preserves the original color of pixels in challenging areas and fine-tunes saturation based on parameter settings.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117174"},"PeriodicalIF":3.4,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoxuan Chen , Shuwen Xu , Shaohai Hu , Xiaole Ma
{"title":"MGFA : A multi-scale global feature autoencoder to fuse infrared and visible images","authors":"Xiaoxuan Chen , Shuwen Xu , Shaohai Hu , Xiaole Ma","doi":"10.1016/j.image.2024.117168","DOIUrl":"10.1016/j.image.2024.117168","url":null,"abstract":"<div><p>Since the convolutional operation pays too much attention to local information, resulting in the loss of global information and a decline in fusion quality. In order to ensure that the fused image fully captures the features of the entire scene, an end-to-end Multi-scale Global Feature Autoencoder (MGFA) is proposed in this paper, which can generate fused images with both global and local information. In this network, a multi-scale global feature extraction module is proposed, which combines dilated convolutional modules with the Global Context Block (GCBlock) to extract the global features ignored by the convolutional operation. In addition, an adaptive embedded residual fusion module is proposed to fuse different frequency components in the source images with the idea of embedded residual learning. This can enrich the detailed texture of the fused results. Extensive qualitative and quantitative experiments have demonstrated that the proposed method can achieve excellent results in retaining global information and improving visual effects. Furthermore, the fused images obtained in this paper are more adapted to the object detection task and can assist in improving the precision of detection.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117168"},"PeriodicalIF":3.4,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"USteg-DSE: Universal quantitative Steganalysis framework using Densenet merged with Squeeze & Excitation net","authors":"Anuradha Singhal, Punam Bedi","doi":"10.1016/j.image.2024.117171","DOIUrl":"10.1016/j.image.2024.117171","url":null,"abstract":"<div><p>Carrying concealed communication via media is termed as steganography and unraveling details of such covert transmission is known as steganalysis. Extracting details of hidden message like length, position, embedding algorithm etc. forms part of forensic steganalysis. Predicting length of payload in camouflaged interchange is termed as quantitative steganalysis and is an indispensable tool for forensic investigators. When payload length is estimated without any prior knowledge about cover media or used steganography algorithm, it is termed as universal quantitative steganalysis.</p><p>Most of existing frameworks on quantitative steganalysis available in literature, work for a specific embedding algorithm or are domain specific. In this paper we propose and present USteg-DSE, a deep learning framework for performing universal quantitative image steganalysis using DenseNet with Squeeze & Excitation module (SEM). In deep learning techniques, deeper networks easily capture complex statistical properties. But as depth increases, networks suffer from vanishing gradient problem. In classic architectures, all channels are equally weighted to produce feature maps. Presented USteg-DSE framework overcomes these problems by using DenseNet and SEM. In DenseNet, each layer is directly connected with every other layer. DenseNet makes information and gradient flow easier with fewer feature maps. SEM incorporates content aware mechanism to adaptively regulate weight for every feature map. Presented framework has been compared with existing state-of-the-art techniques for spatial domain as well as transform domain and show better results in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE).</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117171"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning sparse feature representation for blind quality assessment of night-time images","authors":"Maryam Karimi , Mansour Nejati","doi":"10.1016/j.image.2024.117167","DOIUrl":"10.1016/j.image.2024.117167","url":null,"abstract":"<div><p>Capturing Night-Time Images (NTIs) with high-quality is quite challenging for consumer photography and several practical applications. Thus, addressing the quality assessment of night-time images is urgently needed. Since there is no available reference image for such images, Night-Time image Quality Assessment (NTQA) should be done blindly. Although Blind natural Image Quality Assessment (BIQA) has attracted a great deal of attention for a long time, very little work has been done in the field of NTQA. Due to the capturing conditions, NTIs suffer from various complex authentic distortions that make it a challenging field of research. Therefore, previous BIQA methods, do not provide sufficient correlation with subjective scores in the case of NTIs and special methods of NTQA should be developed. In this paper we conduct an unsupervised feature learning method for blind quality assessment of night-time images. The features are the sparse representation over the data-adaptive dictionaries learned on the image exposure and gradient magnitude maps. Having these features, an ensemble regression model trained using least squares gradient boosting scheme predicts high correlated objective scores on the standard datasets.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117167"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingdi Hu , Jingbing Yang , Jianxun Yu , Bingyi Jing
{"title":"Prior-DualGAN: Rain rendering from coarse to fine","authors":"Mingdi Hu , Jingbing Yang , Jianxun Yu , Bingyi Jing","doi":"10.1016/j.image.2024.117170","DOIUrl":"10.1016/j.image.2024.117170","url":null,"abstract":"<div><p>The success of deep neural networks (<em>DNN</em>) in deraining has led to increased research in rain rendering. In this paper, we introduce a novel <em>Prior-DualGAN</em> algorithm to synthesize diverse and realistic rainy/non-rainy image pairs to improve <em>DNN</em> training for deraining. More precisely, the rain streak prior is first generated using essential rain streak attributes; then more realistic and diverse rain streak patterns are rendered by the first generator; finally, the second generator naturally fuses the background and generated rain streaks to produce the final rainy images. Our method has two main advantages: (1) the rain streak prior enables the network to incorporate physical prior knowledge, accelerating network convergence; (2) our dual <em>GAN</em> approach gradually improves the naturalness and diversity of synthesized rainy images from rain streak synthesis to rainy image synthesis. We evaluate existing deraining algorithms using our generated rain-augmented datasets <em>Rain100L</em>, <em>Rain14000</em>, and <em>Rain-Vehicle</em>, verifying that training with our generated rain-augmented datasets significantly improves the deraining effect. The source code will be released shortly after article’s acceptance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117170"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}