{"title":"Improving Real-Time Pedestrian Detection Using Adaptive Confidence Thresholding and Inter-Frame Correlation","authors":"M. Al-Shatnawi, Vida Movahedi, A. Asif, Aijun An","doi":"10.1109/MMSP.2018.8547103","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547103","url":null,"abstract":"The pedestrian detection algorithms form a key component in the multiple pedestrian tracking (MPT) systems. Despite efforts to detect a pedestrian accurately, it is still a challenging task. We propose a novel and efficient online method to improve the performance of the multiple person/pedestrian detector by introducing novel post-processing steps. These steps use an adaptive approach to determine both area and confidence score constraints for the output of any given multiple pedestrian detector. In this paper, we focus on pedestrian detection in video surveillance applications that require an automated, accurate and precise pedestrian detection algorithm. We demonstrate that the new steps make the multiple pedestrian detector more accurate, precise and tolerant to false positive detections. This is illustrated by evaluating the performance of the proposed method in test video sequences taken from the Pedestrian Detection Challenge, Multiple Object Tracking Benchmark (MOT Challenge 2017).","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Transfer Learning for Hyperspectral Image Classification","authors":"Jianzhe Lin, R. Ward, Z. J. Wang","doi":"10.1109/MMSP.2018.8547139","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547139","url":null,"abstract":"Hyperspectral image (HSI) includes a vast quantities of samples, large number of bands, as well as randomly occurring redundancy. Classifying such complex data is challenging, and the classification performance generally is affected significantly by the amount of labeled training samples. Collecting such labeled training samples is labor and time consuming, motivating the idea of borrowing and reusing labeled samples from other preexisting related images. Therefore transfer learning, which can mitigate the semantic gap between existing and new HSI, has recently drawn increasing research attention. However, existing transfer learning methods for HSI which concentrated on how to overcome the divergence among images, may neglect the high level latent features during the transfer learning process. In this paper, we present two novel ideas based on this observation. We propose constructing and connecting higher level features for the source and target HSI data, to further overcome the cross-domain disparity. Different from existing methods, no priori knowledge on the target domain is needed for the proposed classification framework, and the proposed framework works for both homogeneous and heterogenous HSI data. Experimental results on real world hyperspectral images indicate the significance of the proposed method in HSI classification.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123020165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Valenzise, Andrei I. Purica, Vedad Hulusic, Marco Cagnazzo
{"title":"Quality Assessment of Deep-Learning-Based Image Compression","authors":"G. Valenzise, Andrei I. Purica, Vedad Hulusic, Marco Cagnazzo","doi":"10.1109/MMSP.2018.8547064","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547064","url":null,"abstract":"Image compression standards rely on predictive coding, transform coding, quantization and entropy coding, in order to achieve high compression performance. Very recently, deep generative models have been used to optimize or replace some of these operations, with very promising results. However, so far no systematic and independent study of the coding performance of these algorithms has been carried out. In this paper, for the first time, we conduct a subjective evaluation of two recent deep-learning-based image compression algorithms, comparing them to JPEG 2000 and to the recent BPG image codec based on HEVC Intra. We found that compression approaches based on deep auto-encoders can achieve coding performance higher than JPEG 2000, and sometimes as good as BPG. We also show experimentally that the PSNR metric is to be avoided when evaluating the visual quality of deep-learning-based methods, as their artifacts have different characteristics from those of DCT or wavelet-based codecs. In particular, images compressed at low bitrate appear more natural than JPEG 2000 coded pictures, according to a no-reference naturalness measure. Our study indicates that deep generative models are likely to bring huge innovation into the video coding arena in the coming years.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121655117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion Compensated Prediction for Translational Camera Motion in Spherical Video Coding","authors":"B. Vishwanath, Tejaswi Nanjundaswamy, K. Rose","doi":"10.1109/MMSP.2018.8547066","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547066","url":null,"abstract":"Spherical video is the key driving factor for the growth of virtual reality and augmented reality applications, as it offers truly immersive experience by capturing the entire 3D surroundings. However, it represents an enormous amount of data for storage/transmission and success of all related applications is critically dependent on efficient compression. A frequently encountered type of content in this video format is due to translational motion of the camera (e.g., a camera mounted on a moving vehicle). Existing approaches simply project this video onto a plane and use block based translational motion model for capturing the motion of the objects between the frames. This ad-hoc simplified approach completely ignores the complex deformities of objects caused due to the combined effect of the moving camera and projection onto a plane, rendering it significantly suboptimal. In this paper, we provide an efficient solution tailored to this problem. Specifically, we propose to perform motion compensated prediction by translating pixels along their geodesics, which intersect at the poles corresponding to the camera velocity vector. This setup not only captures the surrounding objects' motion exactly along the geodesics of the sphere, but also accurately accounts for the deformations caused due to projection on the sphere. Experimental results demonstrate that the proposed framework achieves very significant gains over existing motion models.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123860373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Classification of Farming Activities with Motion-Adaptive Feature Sampling","authors":"He Liu, A. Reibman, A. Ault, J. Krogmeier","doi":"10.1109/MMSP.2018.8547117","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547117","url":null,"abstract":"Recently, video has been applied in different industrial applications including autonomous driving vehicles. However, to develop autonomous farming vehicles, the video analysis must be targeted for specific farming activities. So an important first step is to classify the videos into their specific farming activity. In this paper, we propose a video classification framework that includes two branches that process videos differently based on their motions. A gradient-based method is proposed for separating videos into two subsets which are then processed by different feature sampling strategies. The result shows that two motion-based feature sampling strategies provide more efficient features; thus better classification performances are achieved. We also discuss how the feature sampling strategy influences the classification accuracy and the computational efficiency. In addition to farming videos, this proposed system can also be applied to classify videos captured from various camera movements, such as hand-held or first-person cameras.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124019949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quoc-Tin Phan, Cecilia Pasquini, G. Boato, F. D. Natale
{"title":"Identifying Image Provenance: An Analysis of Mobile Instant Messaging Apps","authors":"Quoc-Tin Phan, Cecilia Pasquini, G. Boato, F. D. Natale","doi":"10.1109/MMSP.2018.8547050","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547050","url":null,"abstract":"Studying the impact of sharing platforms like social networks and messaging services on multimedia content nowadays represents a due step in multimedia forensics research. In this framework, we study the characteristics of images that are uploaded and shared through three popular mobile messaging apps combined with two different sending mobile operating systems (OS). In our analysis, we consider information contained both in the image signal and in the metadata of the image file. We show that it is generally possible to identify a posteriori the last app and the OS that have been used for uploading. This is done by considering different scenarios involving images shared both once and twice. Moreover, we show that, by leveraging the knowledge of the last sharing app and system, it is possible to retrieve information on the previous sharing step for double shared images. In relation to prior works, a discussion on the influence of the rescaling and recompression mechanism - usually performed differently through apps and OSs - is also proposed, and the feasibility of retrieving the compression parameters of the image before being shared is assessed.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126219759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nils Genser, Simon Grosche, Jürgen Seiler, André Kaup
{"title":"Sparse Hartley Modeling for Fast Image Extrapolation","authors":"Nils Genser, Simon Grosche, Jürgen Seiler, André Kaup","doi":"10.1109/MMSP.2018.8547100","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547100","url":null,"abstract":"In many cases, image and video signal processing demands for high quality extrapolation algorithms, e.g., to solve inpainting problems or to increase image resolution. Indeed, a high computational load goes hand in hand with a good reconstruction quality as expensive models are calculated to estimate the missing data. To overcome this, the high-speed sparse Hartley modeling is introduced in this paper. This algorithm is based on Frequency Selective Extrapolation. In contrast to that, the model generation is carried out in the Hartley domain to exploit its real-valued transform properties. Due to this, it is possible to reduce the computational complexity significantly as no complex-valued arithmetic operations have to be conducted. In other words, a slightly higher reconstruction quality is obtained, while the proposed method is more than three times faster than the competing Frequency Selective Extrapolation.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126314980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Siamese Network for Multiple Object Tracking","authors":"Bonan Cuan, Khalid Idrissi, Christophe Garcia","doi":"10.1109/MMSP.2018.8547137","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547137","url":null,"abstract":"Multiple object tracking is an important but challenging computer vision task. Thanks to the significant progress in object detection field, tracking-by-detection becomes a trending paradigm for tracking multiple objects at the same time. Appearance models are also widely used for associating detection results. In this paper, we combine cosine similarity metric learning with very deep convolutional neural network, yielding a robust appearance pairwise matching model: a deep Siamese network capable of re-identifying the same object after a long time and dealing with partial and complete occlusion. Embedded in existing tracking algorithms, our model is a lightweight but powerful module for decision-making among track hypotheses. Experiments on MOT Challenge 2016 benchmark [1] demonstrate the effectiveness of our model, which achieves state-of-the-art performance without delving into extensive hyper-parameter tuning.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132414206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color Noise-Based Feature for Splicing Detection and Localization","authors":"C. Destruel, V. Itier, O. Strauss, W. Puech","doi":"10.1109/MMSP.2018.8547093","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547093","url":null,"abstract":"Images that have been altered and more specifically spliced together have invaded the digital domain due to the ease with which we are able to copy and paste them. To detect such forgeries the digital image processing community is proposing new automatic algorithms designed to help human operators reveal manipulated images. In this paper, we focus on a local detection system, which considers which tampered areas produce local statistical effects that do not impact neighboring areas or the image as a whole. We propose to study how the definition of local blocks, considering their size and overlap, impacts final pixel detection. We also propose new features which are an original way to consider the noise of an image as a colored signal. Indeed, in a non-forged image, there is a high correlation of noise between the three color channels R, G and B. We show that an optimal configuration can be defined and in this case the proposed approach outperforms several previously proposed methods using the same tested dataset, in uncompressed and JPEG modes. Note, in this paper we only focus on feature extraction without using machine learning.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133972866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Reinforcement and Immersive Audio","authors":"Timothy Bartoo, R. Whittaker, Dave Haydon","doi":"10.1109/MMSP.2018.8547099","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547099","url":null,"abstract":"Different techniques are used for spatial audio on its different scales such as binaural headphone, home theatre, and auditoria. The large scale is particularly challenging, and the development and deployment of effective processing calls on understanding the nature of not just the sound signals, but also how the brain interprets the signals and associated cues such as visual. This paper reviews the progress which has led to state-of-the-art techniques used for large venue acoustics. It is in tutorial style as seen primarily by practitioners rather than from a pure academic viewpoint.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134077727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}