{"title":"Audio Based Handwriting Input for Tiny Mobile Devices","authors":"Tuo Yu, Haiming Jin, K. Nahrstedt","doi":"10.1109/MIPR.2018.00030","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00030","url":null,"abstract":"The popularization of tiny mobile devices has raised the problem that it is hard to efficiently input messages via tiny keyboards or touch screens. In this paper, we present TableWrite, an audio-based handwriting input scheme, which allows users to input words to mobile devices by writing on tables with fingers. The key feature is that, once trained by a user, TableWrite does not require any retraining phase before each use. To reduce the impacts of audio signal’s multipath propagation, we design multiple features that maintain consistency even when writing positions keep changing. We apply machine learning and gesture tracking techniques to further improve the accuracy of handwriting recognition. Our prototype system’s experimental results show that the average accuracy of word recognition is around 90%-95% in lab environments, which validates the effectiveness of TableWrite.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125575767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimedia Retrieval that Works","authors":"R. Aygun, Wanda Benesova","doi":"10.1109/MIPR.2018.00019","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00019","url":null,"abstract":"Multimedia information retrieval has been a challenging problem due to the diversity and size of multimedia data along with difficulty of expressing desired queries. This paper highlights key points of multimedia retrieval approaches that work. After providing discussion on the success of multimedia information retrieval, the paper analyzes the problem of retrieval challenge (i.e., the capability of retrieving every multimedia object) and proposes page-oriented precision as an alternative evaluation measure for the performance of multimedia information systems.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117287756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Liang, Jing Zhang, Q. Tian, Jiafeng Li, L. Zhuo
{"title":"A Saliency Guided Shallow Convolutional Neural Network for Traffic Signs Retrieval","authors":"Xi Liang, Jing Zhang, Q. Tian, Jiafeng Li, L. Zhuo","doi":"10.1109/MIPR.2018.00076","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00076","url":null,"abstract":"As one of the important parts of road infrastructure, traffic signs provide vital information for road users. Achieving efficient traffic signs retrieval greatly contributes to the intelligent analysis on big traffic data. In this paper, we propose a saliency guided shallow convolutional neural network (CNN) for traffic signs accurate and fast retrieval. Firstly, by unifying deep saliency and hashing learning in a single architecture, the proposed CNN model performs joint learning in a point-wise manner, which is scalable on large-scale datasets. Then, deep saliency features and hashing-like outputs are extracted from traffic sign images with the saliency guided shallow CNN. The binarized hashing-like outputs together with saliency features are used to construct features database. Finally, a coarse to fine similarity measurement is performed by Euclidean distance and Hamming distance to return retrieval results. Experimental results demonstrate the retrieval accuracy of our method outperforms five state-of-the-art methods on GTSRB dataset.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"494 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122753006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Li, Hui Qiao, Chen Zhao, Zhongqi Wu, Ruigang Yang
{"title":"Robust Surface Light Field Modeling","authors":"Wei Li, Hui Qiao, Chen Zhao, Zhongqi Wu, Ruigang Yang","doi":"10.1109/MIPR.2018.00073","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00073","url":null,"abstract":"Surface light field advances conventional light field rendering techniques by utilizing geometry information. Using surface light field, real-world objects with complex appearance could be faithfully represented. This capability could play an important role in many VR/AR applications. However, an accurate geometric model is needed for surface light field sampling and processing, which limits its wide usage since many objects of interests are difficult if not impossible to reconstruct with their usually very complex appearances. We propose a novel optimization framework to reduce the dependency of accurate geometry. The key insight is to treat surface light sampling as a multi-view multi-texture optimization problem. Our approach can deal with both model inaccuracy and texture to model misalignment, making it possible to create high-fidelity surface light field models without using high-precision special hardware.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133350812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Marra, Diego Gragnaniello, D. Cozzolino, L. Verdoliva
{"title":"Detection of GAN-Generated Fake Images over Social Networks","authors":"Francesco Marra, Diego Gragnaniello, D. Cozzolino, L. Verdoliva","doi":"10.1109/MIPR.2018.00084","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00084","url":null,"abstract":"The diffusion of fake images and videos on social networks is a fast growing problem. Commercial media editing tools allow anyone to remove, add, or clone people and objects, to generate fake images. Many techniques have been proposed to detect such conventional fakes, but new attacks emerge by the day. Image-to-image translation, based on generative adversarial networks (GANs), appears as one of the most dangerous, as it allows one to modify context and semantics of images in a very realistic way. In this paper, we study the performance of several image forgery detectors against image-to-image translation, both in ideal conditions, and in the presence of compression, routinely performed upon uploading on social networks. The study, carried out on a dataset of 36302 images, shows that detection accuracies up to 95% can be achieved by both conventional and deep learning detectors, but only the latter keep providing a high accuracy, up to 89%, on compressed data.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116396158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Sabet, M. Hashemi, S. Shirmohammadi, M. Ghanbari
{"title":"A Novel Objective Quality Assessment Method for Perceptually-Coded Cloud Gaming Video","authors":"S. Sabet, M. Hashemi, S. Shirmohammadi, M. Ghanbari","doi":"10.1109/MIPR.2018.00021","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00021","url":null,"abstract":"Cloud Gaming (CG) as a viable alternative to console gaming is gaining more acceptance and growing its market share in the gaming industry. In CG, the game events are processed in the cloud and the resulting scenes are streamed as a video sequence to players. In this new paradigm, one of the most important factors that has a significant impact on user quality of experience is video quality. To address the inherent high bandwidth requirement of CG, game videos should be compressed. This compression may have a negative impact on the user’s quality of experience (QoE) and the assessment of this impact on user satisfaction is a challenging task. Over the years, many research works have investigated the objective and subjective quality of video, but none are directly suitable for the assessment of perceptual video quality in the context of CG. Other methods, such as eye-tracking weighted peak signal-to-noise ratio (EWPSNR) that may work in this context, require an eye-tracking device that is not always available. In this paper, we propose a new weighted PSNR objective quality method that does not require any eye-tracker or information from the game designer (such as the importance of objects in the game) to measure game video quality. Our evaluation based on 3 actual games show that our proposed method has 51% and 11% better correlation with the Mean Opinion Score (MOS) compared to PSNR and SSIM measures, respectively.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128796065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-types Court E-File Classification System","authors":"Wei Duan, Lin Li","doi":"10.1109/MIPR.2018.00048","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00048","url":null,"abstract":"When handling cases, the courts spend a lot of time and effort on the classification of electronic files. This paper discusses a classification system that intelligently classifies scanned images for uploading. We have proposed corresponding solutions to the various effects of scanning on E-File for lots of reasons. Then we designed two sets of different algorithms to achieve the classification by the features that were classified into two categories of documents and pictures. In reality, using this system shows the classification accuracy is 92%, greatly improving the efficiency of staff to reduce their burden.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124854467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Vehicle Motion Planning for Search and Tracking","authors":"Ju Wang, Weibo Chen, V. Temu","doi":"10.1109/MIPR.2018.00078","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00078","url":null,"abstract":"We present a vision-based search and rescue system which uses a unmanned aerial vehicle(UAV) swarm to search and track missing personnel/animals. A major benefit of multiple-vehicle search operation is the extended coverage due to the \"bridging\" effect between the vehicles, which allow a larger and further search area that is beyond the reach of a single vehicle. The challenge here is to plan the UAV swarm's motion paths while maintain the communication links between vehicles during the flight. Our path planning method uses a two-tie search algorithm to approximate the optimum paths for n-UAV search. The integrated vision pipeline and target recognition subsystem is evaluated with emulated UAVs and image sensors.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114183872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. M. Taqi, AhmedM.El Awad, Fadwa Al-Azzo, M. Milanova
{"title":"The Impact of Multi-Optimizers and Data Augmentation on TensorFlow Convolutional Neural Network Performance","authors":"A. M. Taqi, AhmedM.El Awad, Fadwa Al-Azzo, M. Milanova","doi":"10.1109/MIPR.2018.00032","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00032","url":null,"abstract":"This paper introduces a new methodology for Alzheimer disease (AD) classification based on TensorFlow Convolu-tional Neural Network (TF-CNN). The network consists of three convolutional layers to extract AD features, a flatten-ing layer to reduce dimensionality, and two fully connected layers to classify the extracted features. The whole purpose of TensorFlow is to have a computational graph. To boost the classification performance, two main con-tributions have been done: data augmentation and multi-optimizers. The data augmentation helps to decrease over-fitting and increase the performance of the model. The training dataset images are augmented by normalizing, rotating, and cropping them. Four different optimizers are used with the TF-CNN, Adagrad, ProximalAdagrad, Adam, and RMSProp to achieve accurate classification. The re-sult demonstrates that the loss value of the Adam and RMSProp optimizers was lower than the Adagrad and ProximalAdagrad optimizers. The classification accuracy using Adam optimizer is 95.8%, while it reaches 100% when using RMSProp optimizer.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124744570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Human Aging Patterns from a Machine Perspective","authors":"Shixing Chen, Ming Dong, Jialiang Le, S. Barbat","doi":"10.1109/MIPR.2018.00055","DOIUrl":"https://doi.org/10.1109/MIPR.2018.00055","url":null,"abstract":"Recent research shows that the aging patterns deeply learned from large-scale data lead to significant performance improvement on age estimation. However, the insight about why and how deep learning models achieved superior performance is inadequate. In this paper, we propose to analyze, visualize and understand the deep aging patterns. We first train a series of convolutional neural networks for age estimation, and then illustrate the learning outcomes using feature maps, activation histograms, and deconvolution. We also develop a visualization method that can compare the facial appearance and track its changes at different ages through the mapping between 2D images and a 3D face template. Our framework provides an innovative way to understand human facial aging process from a machine perspective.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134034534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}