{"title":"SPmat: A Framework and Data Representation for Binary Image Processing","authors":"Fabrizio Pedersoli, G. Tzanetakis","doi":"10.1109/MMSP.2018.8547142","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547142","url":null,"abstract":"We propose an optimized framework for binary image processing, characterized by a highly bit-packed representation of pixels and their square neighbourhood. The Super-Packed (SPmat) representation for binary images enables the easy use of bit-wise computations for developing fast processing algorithms, such as: morphology, contours, run-length, and thinning, in a unified framework. With several experiments, we show that the aforementioned algorithms can be consistently sped-up, and outperform by a large margin available software implementations. The software package is freely available on github at the url https://github.com/fpeder/spmat to support reproducibility.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116930084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Polyphonic Sound Event Detection by Using Multi Frame Size Denoising Autoencoder","authors":"Jianchao Zhou, Xiaoou Chen, Deshun Yang","doi":"10.1109/MMSP.2018.8547060","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547060","url":null,"abstract":"Over the past few years, lots of research has been done on polyphonic sound event detection. A main problem with sound event detection is that the detection performance sharply degrades in the presence of noise. As denoising autoencoder reportedly has superior performance in noisy environments, this paper proposes to use denoising autoencoder, which is trained by multi frame size information of audio signals, to extract robust features in a task of polyphonic sound event detection under noisy conditions. Performance of the extracted feature is evaluated by polyphonic sound event detection experiments with different noise levels, and compared with that of baseline features including Mel-band Energy (Mel), Log mel-band Energy (Logmel) and mel-frequency cepstral coefficients (MFCC). The experiemntal results show that the proposed feature has the best robustness among all features and achieves the best detection effect under noisy conditions.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124696493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierre R. Lebreton, Kimiko Kawashima, Kazuhisa Yamagishi, J. Okamoto
{"title":"Study on Viewing Time with Regards to Quality Factors in Adaptive Bitrate Video Streaming","authors":"Pierre R. Lebreton, Kimiko Kawashima, Kazuhisa Yamagishi, J. Okamoto","doi":"10.1109/MMSP.2018.8547057","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547057","url":null,"abstract":"In this work, the evaluation of user engagement's characteristics in adaptive bitrate video streaming is addressed. To this aim, the viewing time and its relation with video quality is studied in two carefully designed subjective tests. Video quality and viewing time were addressed in distinct experiments. In case of viewing time, users were allowed to stop watching the videos when they desired. It was found that for low-quality videos, the number of users dropping the video increases logarithmically as a function of time. In addition, when a stalling event occurs, users start dropping video playback after a waiting period of 5 seconds. Then, when the stalling ends, the dropping rate returns to its baseline rate (which depends on video quality). The number of users stopping watching video after a stalling event was found to be a function of stalling position, stalling duration, and the quality affected by coding. A baseline model considering only stalling features is defined. Finally, a model for predicting the video completion rate is proposed that achieves a Pearson correlation of 0.96 and a root-mean-square error (RMSE) of 0.064.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130310069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech Intelligibility of Microphone Arrays in Reverberant Environments with Interference","authors":"Elham Ideli, R. Vaughan, I. Bajić","doi":"10.1109/MMSP.2018.8547053","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547053","url":null,"abstract":"It is known that speech intelligibility degrades with additive noise and reverberation, and that quantitative parameters such as fidelity and signal-to-noise ratio can be improved by using microphone arrays with various beamforming algorithms. However, it is not clear how the array configuration impacts the intelligibility of speech. Numerical experiments, using widely-used models, provide the most convenient comparison, and the approach allows rapid assessment of parameters such as the array configuration, the number and spacing of the elements, and modeled features such as room reflection coefficients. For a typical reverberant room with a single wanted source and two unwanted sources (interferers), we compare the performance of two ceiling-mounted configurations - the uniform linear array (ULA) and a uniform circular array (UCA). The microphones are taken as omnidirectional and equispaced along the array loci, and we use a standard gain-constrained power minimization beamformer. In this study, a limiting performance is presented by emphasizing the early reflections over the late ones for the prior steering vector. Under this steering vector condition, for the same number of elements, the UCA easily outperforms the ULA on known quality and intelligibility metrics. For both arrays in this room scenario, all the metrics increase with an increasing number of microphones, although for one intelligibility metric, diminishing returns set in at about 12 microphones.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130415410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasios Alexandridis, Anthony Griffin, A. Mouchtaris
{"title":"Multiple Source Location Estimation on a Dataset of Real Recordings in a Wireless Acoustic Sensor Network","authors":"Anastasios Alexandridis, Anthony Griffin, A. Mouchtaris","doi":"10.1109/MMSP.2018.8547105","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547105","url":null,"abstract":"Recently, wireless acoustic sensor networks (WASNs) have received significant attention from the research community and a variety of methods have been proposed for numerous applications, such as location estimation and speech enhancement. The lack of publicly available datasets with signals recorded in WASNs, presents difficulties in obtaining consistent performance indicators across the different approaches. In this paper, we present and release a dataset of real recorded signals in an outdoor WASN comprised of four microphone arrays. Our dataset consists of several speakers recorded at various locations within the WASN and can be used for benchmarking purposes. We also present location estimation results using our real recorded dataset. Our results can serve as a baseline indicator of localization performance of single and multiple sources in a real environment.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121544978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiba Yousef, J. L. Feuvre, G. Valenzise, Vedad Hulusic
{"title":"Video Quality Evaluation for Tile-Based Spatial Adaptation","authors":"Hiba Yousef, J. L. Feuvre, G. Valenzise, Vedad Hulusic","doi":"10.1109/MMSP.2018.8547126","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547126","url":null,"abstract":"The demand for very high-resolution video content in entertainment services (4K, 8K, panoramic, 360 VR) puts an increasing load on the distribution network. In order to reduce the network usage in existing delivery infrastructure for such services while keeping a good quality of experience, dynamic spatial video adaptation at the client side is seen as a key feature, and is actively investigated by academics and industrials. However, the impact of spatial adaptation on quality perception is not clear. In this paper, we propose a methodology for the evaluation of such adapted content, conduct a series of perceived quality measurements and discuss results showing potential benefits and drawbacks of the technique. Based on our results, we also propose a signaling mechanism in MPEG-DASH to assist the client in its spatial adaptation logic.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121714848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliability Analysis of Io VT Based Intelligent Video Surveillance System","authors":"T. Sultana, Mohammad Wajih Alam, K. Wahid","doi":"10.1109/MMSP.2018.8547101","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547101","url":null,"abstract":"The Internet of things (IoT) provides a broad range of products and services to allow autonomous and secured connectivity by exchanging information between the abstract world and human activities. To ensure that these advanced systems function at their maximum capacity for a long time without any failure or fault, creating benchmarks for them is highly needed. Intelligent video surveillance system (VSS) under the Internet of video things (IoVT) paradigm is providing significant benefits being the most important safety and security tool in a variety of applications. The deterministic approaches used in most of existing VSS performance evaluation methods may often lead to either increment in cost or frequent system failures. To overcome these limitations, this paper proposes a different framework to analyze the system via the concept of reliability. The existing approaches depend solely upon the system input sequences. However, the proposed framework works on the component statistical index. Based on exponential distribution theory of reliability, the reliability of the intelligent VSS is analyzed in this paper, followed by a complete guideline to improve the current reliability of the system.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132421067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Downsampling Based Image Coding Using Dual Dictionary Learning and Sparse Representations","authors":"A. Akbari, M. Trocan","doi":"10.1109/MMSP.2018.8547104","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547104","url":null,"abstract":"Downsampling based image compression scheme achieves better quality at low bit rates. This paper presents a new scheme in such a paradigm based on adaptive sparse representations with respect to two trained overcomplete dictionaries. The original image is downsampled at the encoder side and an upscaling technique is employed to restore the downsampled image to its original resolution at the decoder side. Due to the downsampling, the high frequency details are removed; therefore, the bit budget of low frequency information is increased, leading to better coding performance at the low bitrates. In order to further improve the coding efficiency, we also propose to encode the residual image as side information. This residual image is obtained by difference between the original image and upscaled image. The low resolution image and the residual image are represented over two dictionaries trained by a bilevel dictionary learning algorithm. Furthermore, the visual salient information is considered into the rate allocation process to improve the rate-distortion performance. The enhanced scheme achieves improvement of the quality at a variety of bitrates at the expense of increasing the system complexity, when compared to the conventional codecs.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133085406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reza Moradi Rad, Parvaneh Saeedi, J. Au, J. Havelock
{"title":"Blastomere Cell Counting and Centroid Localization in Microscopic Images of Human Embryo","authors":"Reza Moradi Rad, Parvaneh Saeedi, J. Au, J. Havelock","doi":"10.1109/MMSP.2018.8547107","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547107","url":null,"abstract":"The time of the first cell cleavage in the embryonic development of a human embryo is an important indicator of the embryo's potential for developing into a healthy baby. The time and synchronicity of following cleavages are also linked to the quality of an embryo. In this paper, a deep learning based framework is proposed to take on the challenging task of automatic counting and centroid localization of embryonic cells (blastomeres) in microscopic images of human embryos. In particular, ensemble of residual dilated UNet is proposed to count blastomeres and localize their centroids. Experimental results confirm that the proposed framework is capable of counting blastomeres in a densely occupied and overlapping space of human embryo by an average accuracy of 88.2% for embryos of 1 - 5 cells.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131745996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color-Guided Depth Map Super-Resolution via Joint Graph Laplacian and Gradient Consistency Regularization","authors":"Rong Chen, Deming Zhai, Xianming Liu, Debin Zhao","doi":"10.1109/MMSP.2018.8547124","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547124","url":null,"abstract":"Depth information is being widely used in many real-world applications. However, due to the limitation of depth sensing technology, the captured depth map in practice usually has much lower resolution than that of color image counterpart. In this paper, we propose to joint exploit the internal smoothness prior and external gradient consistency constraint in graph domain for depth super-resolution. On one hand, a new graph Laplacian regularizer is proposed to the preserve the inherent piecewise smooth characteristic of depth, which has desirable filtering properties. On the other hand, inspired by an observation that the gradient of depth is zero except at edge separating regions, we introduce a graph gradient consistency constraint to enforce that the graph gradient of depth is close to the thresholded gradient of guidance. Finally, the internal and external regularizations are casted into a unified optimization framework, which can be efficiently addressed by ADMM. Experiments results demonstrate that our method outperforms the state-of-the-art with respect to both objective and subjective quality evaluations.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1954 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129522549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}