{"title":"Deep neural network based distortion parameter estimation for blind quality measurement of stereoscopic images","authors":"Yi Zhang , Damon M. Chandler , Xuanqin Mou","doi":"10.1016/j.image.2024.117138","DOIUrl":null,"url":null,"abstract":"<div><p>Stereoscopic/3D image quality measurement (SIQM) has emerged as an active and important research branch in image processing/computer vision field. Existing methods for blind/no-reference SIQM often train machine-learning models on degraded stereoscopic images for which human subjective quality ratings have been obtained, and they are thus constrained by the fact that only a limited number of 3D image quality datasets currently exist. Although methods have been proposed to overcome this restriction by predicting distortion parameters rather than quality scores, the approach is still limited to the time-consuming, hand-crafted features extracted to train the corresponding classification/regression models as well as the rather complicated binocular fusion/rivalry models used to predict the cyclopean view. In this paper, we explore the use of deep learning to predict distortion parameters, giving rise to a more efficient opinion-unaware SIQM technique. Specifically, a deep fusion-and-excitation network which takes into account the multiple-distortion interactions is proposed to perform distortion parameter estimation, thus avoiding hand-crafted features by using convolution layers while simultaneously accelerating the algorithm by using the GPU. Moreover, we measure distortion parameter values of the cyclopean view by using support vector regression models which are trained on the data obtained from a newly-designed subjective test. In this way, the potential errors in computing the disparity map and cyclopean view can be prevented, leading to a more rapid and precise 3D-vision distortion parameter estimation. Experimental results tested on various 3D image quality datasets demonstrate that our proposed method, in most cases, offers improved predictive performance over existing state-of-the-art methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117138"},"PeriodicalIF":3.4000,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596524000390","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Stereoscopic/3D image quality measurement (SIQM) has emerged as an active and important research branch in image processing/computer vision field. Existing methods for blind/no-reference SIQM often train machine-learning models on degraded stereoscopic images for which human subjective quality ratings have been obtained, and they are thus constrained by the fact that only a limited number of 3D image quality datasets currently exist. Although methods have been proposed to overcome this restriction by predicting distortion parameters rather than quality scores, the approach is still limited to the time-consuming, hand-crafted features extracted to train the corresponding classification/regression models as well as the rather complicated binocular fusion/rivalry models used to predict the cyclopean view. In this paper, we explore the use of deep learning to predict distortion parameters, giving rise to a more efficient opinion-unaware SIQM technique. Specifically, a deep fusion-and-excitation network which takes into account the multiple-distortion interactions is proposed to perform distortion parameter estimation, thus avoiding hand-crafted features by using convolution layers while simultaneously accelerating the algorithm by using the GPU. Moreover, we measure distortion parameter values of the cyclopean view by using support vector regression models which are trained on the data obtained from a newly-designed subjective test. In this way, the potential errors in computing the disparity map and cyclopean view can be prevented, leading to a more rapid and precise 3D-vision distortion parameter estimation. Experimental results tested on various 3D image quality datasets demonstrate that our proposed method, in most cases, offers improved predictive performance over existing state-of-the-art methods.
期刊介绍:
Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following:
To present a forum for the advancement of theory and practice of image communication.
To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems.
To contribute to a rapid information exchange between the industrial and academic environments.
The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world.
Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments.
Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.