{"title":"A Dual Path Deep Network for Single Image Super-Resolution Reconstruction","authors":"Fateme S. Mirshahi, Parvaneh Saeedi","doi":"10.1109/MMSP.2018.8547049","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547049","url":null,"abstract":"Super-resolution reconstruction based on deep learning has come a long way since the first proposed method in 2015. Numerous methods have been developed for this task using deep learning approaches. Among these methods, residual deep learning algorithms have shown better performance. Although all early proposed deep learning based super-resolution frameworks used bicubic upsampled versions of low resolution images as the main input, most of the current ones use the low resolution images directly by adding up-sampling layers to their networks. In this work, we propose a new method by using both low resolution and bicubic upsampled images as the inputs to our network. The final results confirm that decreasing the depth of the network in lower resolution space and adding the bicubic path lead to almost similar results to those of the deeper networks in terms of PSNR and SSIM, yet making the network computationally inexpensive and more efficient.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125678776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Attacks on Face Detectors Using Neural Net Based Constrained Optimization","authors":"A. Bose, P. Aarabi","doi":"10.1109/MMSP.2018.8547128","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547128","url":null,"abstract":"Adversarial attacks involve adding, small, often imperceptible, perturbations to inputs with the goal of getting a machine learning model to misclassifying them. While many different adversarial attack strategies have been proposed on image classification models, object detection pipelines have been much harder to break. In this paper, we propose a novel strategy to craft adversarial examples by solving a constrained optimization problem using an adversarial generator network. Our approach is fast and scalable, requiring only a forward pass through our trained generator network to craft an adversarial sample. Unlike in many attack strategies we show that the same trained generator is capable of attacking new images without explicitly optimizing on them. We evaluate our attack on a trained Faster R-CNN face detector on the cropped 300-W face dataset where we manage to reduce the number of detected faces to 0.5% of all originally detected faces. In a different experiment, also on 300-W, we demonstrate the robustness of our attack to a JPEG compression based defense typical JPEG compression level of 75% reduces the effectiveness of our attack from only 0.5% of detected faces to a modest 5.0%.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116362592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MV-YOLO: Motion Vector-Aided Tracking by Semantic Object Detection","authors":"Saeed Ranjbar Alvar, I. Bajić","doi":"10.1109/MMSP.2018.8547125","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547125","url":null,"abstract":"Object tracking is the cornerstone of many visual analytics systems. While considerable progress has been made in this area in recent years, robust, efficient, and accurate tracking in real-world video remains a challenge. In this paper, we present a hybrid tracker that leverages motion information from the compressed video stream and a general-purpose semantic object detector acting on decoded frames to construct a fast and efficient tracking engine. The proposed approach is compared with several well-known recent trackers on the OTB tracking dataset. The results indicate advantages of the proposed method in terms of speed and/or accuracy. Other desirable features of the proposed method are its simplicity and deployment efficiency, which stems from the fact that it reuses the resources and information that may already exist in the system for other reasons.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124104589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Near-Lossless Deep Feature Compression for Collaborative Intelligence","authors":"Hyomin Choi, I. Bajić","doi":"10.1109/MMSP.2018.8547134","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547134","url":null,"abstract":"Collaborative intelligence is a new paradigm for efficient deployment of deep neural networks across the mobile-cloud infrastructure. By dividing the network between the mobile and the cloud, it is possible to distribute the computational workload such that the overall energy and/or latency of the system is minimized. However, this necessitates sending deep feature data from the mobile to the cloud in order to perform inference. In this work, we examine the differences between the deep feature data and natural image data, and propose a simple and effective near-lossless deep feature compressor. The proposed method achieves up to 5% bit rate reduction compared to HEVC-Intra and even more against other popular image codecs. Finally, we suggest an approach for reconstructing the input image from compressed deep features that could serve to supplement the inference performed by the deep model.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129922254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Çağlar Aytekin, Xingyang Ni, Francesco Cricri, Lixin Fan, Emre B. Aksu
{"title":"Memory-Efficient Deep Salient Object Segmentation Networks on Gridized Superpixels","authors":"Çağlar Aytekin, Xingyang Ni, Francesco Cricri, Lixin Fan, Emre B. Aksu","doi":"10.1109/MMSP.2018.8547096","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547096","url":null,"abstract":"Computer vision algorithms with pixel-wise labeling tasks, such as semantic segmentation and salient object detection, have gone through a significant accuracy increase with the incorporation of deep learning. Deep segmentation methods slightly modify and fine-tune pre-trained networks that have hundreds of millions of parameters. In this work, we question the need of having such memory demanding networks for a reasonable performance in salient object segmentation. To this end, we propose a way to learn a memory-efficient network from scratch by training it only on salient object detection datasets. Our method encodes images to gridized superpixels that preserve both the object boundaries and the connectivity rules of regular pixels. This representation allows us to use convolutional neural networks that operate on regular grids. By using these encoded images, we train a memory-efficient network using only 0.048% of the number of parameters that a majority of other deep salient object detection networks have. Our method shows comparable accuracy with the state-of-the-art deep salient object detection methods and provides a much more memory-efficient alternative to them. Due to its easy deployment and small size, such a network is preferable for applications in memory limited IoT devices.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132651871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement","authors":"J. Valin","doi":"10.1109/MMSP.2018.8547084","DOIUrl":"https://doi.org/10.1109/MMSP.2018.8547084","url":null,"abstract":"Despite noise suppression being a mature area in signal processing, it remains highly dependent on fine tuning of estimator algorithms and parameters. In this paper, we demonstrate a hybrid DSP/deep learning approach to noise suppression. We focus strongly on keeping the complexity as low as possible, while still achieving high-quality enhanced speech. A deep recurrent neural network with four hidden layers is used to estimate ideal critical band gains, while a more traditional pitch filter attenuates noise between pitch harmonics. The approach achieves significantly higher quality than a traditional minimum mean squared error spectral estimator, while keeping the complexity low enough for real-time operation at 48 kHz on a low-power CPU.","PeriodicalId":137522,"journal":{"name":"2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)","volume":"20 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120933229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}