{"title":"SIQ288: A saliency dataset for image quality research","authors":"Wei Zhang, Hantao Liu","doi":"10.1109/MMSP.2016.7813334","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813334","url":null,"abstract":"Saliency modelling for image quality research has been an active topic in multimedia over the last five years. Saliency aspects have been added to many image quality metrics (IQMs) to improve their performance in predicting perceived quality. However, challenges to optimising the performance of saliency-based IQMs remain. To make further progress, a better understanding of human attention deployment in relation to image quality through eye-tracking experimentation is indispensable. Collecting substantial eye-tracking data is often confronted with a bias due to the involvement of massive stimulus repetition that typically occurs in an image quality study. To mitigate this problem, we proposed a new experimental methodology with dedicated control mechanisms, which allows collecting more reliable eye-tracking data. We recorded 5760 trials of eye movements from 160 human observers. Our dataset consists of 288 images representing a large degree of variability in terms of scene content, distortion type as well as degradation level. We illustrate how saliency is affected by the variations of image quality. We also compare state of the art saliency models in terms of predicting where people look in both original and distorted scenes. Our dataset helps investigate the actual role saliency plays in judging image quality, and provides a benchmark for gauging saliency models in the context of image quality.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130621818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Mygdalis, Alexandros Iosifidis, A. Tefas, I. Pitas
{"title":"Laplacian one class extreme learning machines for human action recognition","authors":"V. Mygdalis, Alexandros Iosifidis, A. Tefas, I. Pitas","doi":"10.1109/MMSP.2016.7813387","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813387","url":null,"abstract":"A novel OCC method for human action recognition namely the Laplacian One Class Extreme Learning Machines is presented. The proposed method exploits local geometric data information within the OC-ELM optimization process. It is shown that emphasizing on preserving the local geometry of the data leads to a regularized solution, which models the target class more efficiently than the standard OC-ELM algorithm. The proposed method is extended to operate in feature spaces determined by the network hidden layer outputs, as well as in ELM spaces of arbitrary dimensions. Its superior performance against other OCC options is consistent among five publicly available human action recognition datasets.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116373258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yashas Rai, Ahmed Aldahdooh, Suiyi Ling, M. Barkowsky, P. Callet
{"title":"Effect of content features on short-term video quality in the visual periphery","authors":"Yashas Rai, Ahmed Aldahdooh, Suiyi Ling, M. Barkowsky, P. Callet","doi":"10.1109/MMSP.2016.7813350","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813350","url":null,"abstract":"The area outside our central field of vision, also referred to as the visual periphery, captures most information in a visual scene, although much less sensitive than the central Fovea. Vision studies in the past have stated that there is reduced sensitivity of texture, color, motion and flicker (temporal harmonic) perception in this area, that bears an interesting application in the domain of quality perception. In this work, we particularly analyze the perceived subjective quality of videos containing H.264/AVC transmission impairments, incident at various degrees of retinal eccentricities of observers. We relate the perceived drop in quality, to five basic types of features that are important from a perceptive standpoint: texture, color, flicker, motion trajectory distortions and also the semantic importance of the underlying regions. We are able to observe that the perceived drop in quality across the visual periphery, is closely related to the Cortical Magnification fall-off characteristics of the V1 cortical region. Additionally, we see that while object importance and low frequency spatial distortions are important indicators of quality in the central foveal region, temporal flicker and color distortions are the most important determinants of quality in the periphery. We therefore conclude that, although users are more forgiving of distortions they viewed peripherally, they are nevertheless not totally blind towards it: the effects of flicker and color distortions being particularly important.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122047540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting in image quality assessment","authors":"Dogancan Temel, G. Al-Regib","doi":"10.1109/MMSP.2016.7813335","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813335","url":null,"abstract":"In this paper, we analyze the effect of boosting in image quality assessment through multi-method fusion. On the contrary of existing studies that propose a single quality estimator, we investigate the generalizability of multi-method fusion as a framework. In addition to support vector machines that are commonly used in the multi-method fusion studies, we propose using neural networks in the boosting. To span different types of image quality assessment algorithms, we use quality estimators based on fidelity, perceptually-extended fidelity, structural similarity, spectral similarity, color, and learning. In the experiments, we perform k-fold cross validation using the LIVE, the multiply distorted LIVE, and the TID 2013 databases and the performance of image quality assessment algorithms are measured via accuracy-, linearity-, and ranking-based metrics. Based on the experiments, we show that boosting methods generally improve the performance of image quality assessment and the level of improvement depends on the type of the boosting algorithm. Our experimental results also indicate that boosting the worst performing quality estimator with two or more methods lead to statistically significant performance enhancements independent of the boosting technique and neural network-based boosting outperforms support vector machine-based boosting when two or more methods are fused.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128777524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation based 3D depth watermarking using SIFT","authors":"Shuvendu Rana, S. Gaj, A. Sur, P. Bora","doi":"10.1109/MMSP.2016.7813367","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813367","url":null,"abstract":"In this paper, a 3D image watermarking scheme is proposed to embed the watermark with the depth of the 3D image for depth image based rendering (DIBR) 3D image representation. To make the scheme invariant to view synthesis process, watermark is inserted with the scale invariant feature transform (SIFT) feature point locations obtained from the original image. Moreover, embedding zone for watermarking has been selected in such a way that no watermark can be inserted in the foreground object to avoid perceptible artefacts. Also, a novel watermark embedding policy is used to insert the watermark with the depth of the 3D image to resist the image processing attacks. A comprehensive set of experiments are carried out to justify the robustness of the proposed scheme.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129209952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehrdad Zaker Shahrak, Mengmei Ye, Viswanathan Swaminathan, Sheng Wei
{"title":"Two-way real time multimedia stream authentication using physical unclonable functions","authors":"Mehrdad Zaker Shahrak, Mengmei Ye, Viswanathan Swaminathan, Sheng Wei","doi":"10.1109/MMSP.2016.7813398","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813398","url":null,"abstract":"Multimedia authentication is an integral part of multimedia signal processing in many real-time and security sensitive applications, such as video surveillance. In such applications, a full-fledged video digital rights management (DRM) mechanism is not applicable due to the real time requirement and the difficulties in incorporating complicated license/key management strategies. This paper investigates the potential of multimedia authentication from a brand new angle by employing hardware-based security primitives, such as physical unclonable functions (PUFs). We show that the hardware security approach is not only capable of accomplishing the authentication for both the hardware device and the multimedia stream but, more importantly, introduce minimum performance, resource, and power overhead. We justify our approach using a prototype PUF implementation on Xilinx FPGA boards. Our experimental results on the real hardware demonstrate the high security and low overhead in multimedia authentication obtained by using hardware security approaches.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124579411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. H. Khan, Kimiaki Shirahama, M. S. Farid, M. Grzegorzek
{"title":"Multiple human detection in depth images","authors":"M. H. Khan, Kimiaki Shirahama, M. S. Farid, M. Grzegorzek","doi":"10.1109/MMSP.2016.7813385","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813385","url":null,"abstract":"Most human detection algorithms in depth images perform well in detecting and tracking the movements of a single human object. However, their performance is rather poor when the person is occluded by other objects or when there are multiple humans present in the scene. In this paper, we propose a novel human detection technique which analyzes the edges in depth image to detect multiple people. The proposed technique detects a human head through a fast template matching algorithm and verifies it through a 3D model fitting technique. The entire human body is extracted from the image by using a simple segmentation scheme comprising a few morphological operators. Our experimental results on three large human detection datasets and the comparison with the state-of-the-art method showed an excellent performance achieving a detection rate of 94.53% with a small false alarm of 0.82%.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125283613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical differential image filters for skin analysis","authors":"Jingyi Zhang, P. Aarabi","doi":"10.1109/MMSP.2016.7813384","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813384","url":null,"abstract":"In this paper we present a framework for analyzing skin parameters from portrait images and videos. Using a series of Hierarchical Differential Image Filters (HDIF), it becomes possible to detect different skin features such as wrinkles, spots, and roughness. These detected features are used to compute skin ratings that are compared to actual ratings by dermatologists. Analyzing a database of 49 images with ratings by a panel of dermatologists, the proposed HDIF method is able to detect skin roughness, dark spots, and deep wrinkles with an average rating error of 11.3%, 17.6%, and 15.6%, respectively, as compared to individual dermatologist rating errors of 8.2%, 7.4%, and 6.5%. Although dermatologist ratings are more accurate than the proposed HDIF method, the ratings are close enough that the HDIF ratings can be a viable solution where dermatologist ratings are not readily available.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Delay-rate-distortion optimization for cloud-based collaborative rendering","authors":"Xiaoming Nan, Yifeng He, L. Guan","doi":"10.1109/MMSP.2016.7813405","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813405","url":null,"abstract":"Cloud rendering is emerged as a new cloud service to satisfy user's desire for running sophisticated graphics applications on thin devices. However, traditional cloud rendering approaches, both remote rendering and local rendering, have limitations. Remote rendering shifts intensive rendering tasks to cloud server and streams rendered frames to client, which suffers from high delay and bandwidth usage. Local rendering sends graphics data to client and performs rendering on local devices, which requires initial buffering delay and demands high computation capacity at client. In this paper, we propose a novel cloud based collaborative rendering framework, which adaptively integrates remote rendering and local rendering. Based on the proposed framework, we study the delay-Rate-Distortion (d-R-D) optimization problem, in which the source rates are optimally allocated for streaming encoded video frames and graphics data to minimize the overall distortion under the bandwidth and response delay constraints. Experiment results demonstrate that the proposed collaborative rendering framework can effectively allocate source rates to achieve the minimal distortion compared to the traditional remote rendering and local rendering.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122712166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serhan Gül, Jan Timo Meyer, C. Hellge, T. Schierl, W. Samek
{"title":"Hybrid video object tracking in H.265/HEVC video streams","authors":"Serhan Gül, Jan Timo Meyer, C. Hellge, T. Schierl, W. Samek","doi":"10.1109/MMSP.2016.7813363","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813363","url":null,"abstract":"In this paper we propose a hybrid tracking method which detects moving objects in videos compressed according to H.265/HEVC standard. Our framework largely depends on motion vectors (MV) and block types obtained by partially decoding the video bit stream and occasionally uses pixel domain information to distinguish between two objects. The compressed domain method is based on a Markov Random Field (MRF) model that captures spatial and temporal coherence of the moving object and is updated on a frame-to-frame basis. The hybrid nature of our approach stems from the usage of a pixel domain method that extracts the color information from the fully-decoded I frames and is updated only after completion of each Group-of-Pictures (GOP). We test the tracking accuracy of our method using standard video sequences and show that our hybrid framework provides better tracking accuracy than a state-of-the-art MRF model.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124926022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}