{"title":"Recognition of JSL finger spelling using convolutional neural networks","authors":"Hana Hosoe, Shinji Sako, B. Kwolek","doi":"10.23919/MVA.2017.7986796","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986796","url":null,"abstract":"Recently, a few methods for recognition of hand postures on depth maps using convolutional neural networks were proposed. In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language. The recognition takes place on the basis of single gray image. The finger spelled signs are recognized using a convolutional neural network. A dataset consisting of5000 samples has been recorded. A 3D articulated hand model has been designed to generate synthetic finger spellings and to extend the real hand gestures. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. The full dataset and Caffe model are available for download.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116774172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised video object segmentation by supertrajectory labeling","authors":"Masahiro Masuda, Yoshihiko Mochizuki, H. Ishikawa","doi":"10.23919/MVA.2017.7986897","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986897","url":null,"abstract":"We propose a novel approach to unsupervised video segmentation based on the trajectories of Temporal Super-pixels (TSPs). We cast the segmentation problem as a trajectory-labeling problem and define a Markov random field on a graph in which each node represents a trajectory of TSPs, which we minimize using a new two-stage optimization method we developed. The adaption of the trajectories as basic building blocks brings several advantages over conventional superpixel-based methods, such as more expressive potential functions, temporal coherence of the resulting segmentation, and drastically reduced number of the MRF nodes. The most important effect is, however, that it allows more robust segmentation of the foreground that is static in some frames. The method is evaluated on a subset of the standard SegTrack benchmark and yields competitive results against the state-of-the-art methods.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132217744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Cooper, Mihailo Azhar, Trevor Gee, W. V. D. Mark, P. Delmas, G. Gimel'farb
{"title":"A Raspberry Pi 2-based stereo camera depth meter","authors":"J. Cooper, Mihailo Azhar, Trevor Gee, W. V. D. Mark, P. Delmas, G. Gimel'farb","doi":"10.23919/MVA.2017.7986854","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986854","url":null,"abstract":"The Raspberry Pi single-board computer is a low cost, light weight system with small power requirements. It is an attractive embedded computer vision solution for many applications, including that of UAVs. Here, we focus on the Raspberry Pi 2 and demonstrate that, with the addition of a multiplexer and two camera modules, it is able to execute a full stereo matching pipeline, making it a suitable depth metering device for UAV usage. Our experimental results demonstrate that the proposed configuration is capable of performing reasonably accurate depth estimation for a system moving at a rate of 1 ms−1 when in good lighting conditions.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128695731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast low-level multi-scale feature extraction for hexagonal images","authors":"S. Coleman, B. Scotney, B. Gardiner","doi":"10.23919/MVA.2017.7986871","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986871","url":null,"abstract":"Inspired by the human vision system and its capability to process in real-time, an efficient framework for low-level feature extraction on hexagonal pixel-based images is presented. This is achieved by utilizing the spiral architecture addressing scheme to simulate eye-tremor along with the convolution of non-overlapping gradient masks. Using sparse spiral convolution and the development of cluster operators, we obtain a set of output image responses “a-trous” that is subsequently collated into a consolidated output response; it is also demonstrated that this framework can be extended to feature extraction at different scales. We show that the proposed framework is considerably faster than using conventional spiral convolution or the use of look-up tables for direct access to hexagonal pixel neighbourhood addresses.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"434 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132948095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of online machine vision system using support vector regression (SVR) algorithm for grade prediction of iron ores","authors":"A. K. Patel, S. Chatterjee, A. Gorai","doi":"10.23919/MVA.2017.7986823","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986823","url":null,"abstract":"The present study attempts to develop a machine vision system for continuous monitoring of grades of iron ores during transportation through conveyor belts. The machine vision system was developed using the support vector regression (SVR) algorithm. A radial basis function (RBF) kernel was used for the development of optimized hyperplane by transforming input space into large dimensional feature space. A set of 39-image features (27-colour and 12-texture) were extracted from each of the 88-captured images of iron ore samples. The grade values of iron ore samples corresponding to the 88-captured images were analyzed in the laboratory. The SVR model was developed using the optimized feature subset obtained using a genetic algorithm. The correlation coefficient between the actual grades and model predicted grades for testing samples was found to be 0.8244.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125922224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Skin beautification detection using sparse coding","authors":"Tianyang Sun, Xinyu Hui, Zihao Wang, Shengping Zhang","doi":"10.23919/MVA.2017.7986916","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986916","url":null,"abstract":"In the past years, skin beautifying softwares have been widely used in portable devices for social activities, which have the functionalities of turning one's skin into flawless complexion. With a huge number of photos uploaded to social media, it is useful for users to distinguish whether a photo is beautified or not. To address this problem, in this paper, we propose a skin beautification detection method by mining and distinguishing the intrinsic features of original photos and the corresponding beautified photos. To this aim, we propose to use sparse coding to learn two sets of basis functions using densely sampled patches from the original photos and the beautified photos, respectively. To detect whether a test photo is beautified, we represent the sampled patches from the photo using the learned basis functions and then see which set of basis functions produces more sparse coefficients. To our knowledge, our effort is the first one to detect skin beautification. To validate the effectiveness of the proposed method, we collected about 1000 photos including both the original photos and the photos beautified by a software. Our experimental results indicate the proposed method achieved a desired detection accuracy of over 80%.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128926187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jared Markowitz, Aurora C. Schmidt, P. Burlina, I-J. Wang
{"title":"Hierarchical zero-shot classification with convolutional neural network features and semantic attribute learning","authors":"Jared Markowitz, Aurora C. Schmidt, P. Burlina, I-J. Wang","doi":"10.23919/MVA.2017.7986834","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986834","url":null,"abstract":"We examine hierarchical approaches to image classification problems that include categories for which we have no training examples. Building on prior work in hierarchical classification that optimizes the trade-off between depth in a tree and accuracy of placement, we compare the performance of multiple formulations of the problem on both previously seen (non-novel) and previously unseen (novel) classes. We use a subset of 150 object classes from the ImageNet ILSVRC2012 data set, for which we have 218 human-annotated semantic attribute labels and for which we compute deep convolutional features using the OVERFEAT network. We quantitatively evaluate several approaches, using input posteriors derived from distances to SVM classifier boundaries as well as input posteriors based on semantic attribute estimation. We find that the relative performances of the methods differ in non-novel and novel applications and achieve information gains in novel applications through the incorporation of attribute-based posteriors.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127230146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Connecting the dots: Embodied visual perception from first-person cameras","authors":"Jianbo Shi","doi":"10.23919/MVA.2017.7986843","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986843","url":null,"abstract":"A computer has a complete photographical memory. It creates massive but isolated sensory moments. Unlike such fragmented photographic memory, human memories are highly connected through episodes that allow us to relate past experiences and predict future actions. How to computationally model a human-like episodic memory system that connects photographically accurate sensory moments? Our insight is that an active interaction is a key to link between episodes because sensory moments are fundamentally centered on an active person-self. Our experiences are created by and shared through our social and physical interactions, i.e., we connect episodes driven by similar actions and, in turn, recall these past connected episodes to take a future actions. Therefore, connecting the dotted moments to create an episodic memory requires understanding the purposeful interaction between human (person-self) and world.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127339966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Saliency/non-saliency segregation in video sequences using perception-based local ternary pattern features","authors":"K. L. Chan","doi":"10.23919/MVA.2017.7986912","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986912","url":null,"abstract":"The detection of salient objects in video sequence is an active research area of computer vision. One approach is to perform joint segmentation of objects and background in each image frame of the video. The background scene is learned and modeled. Each pixel is classified as background if it matches the background model. Otherwise the pixel belongs to a salient object. The segregation method faces many difficulties when the video sequence is captured under various dynamic circumstances. To tackle these challenges, we propose a novel perception-based local ternary pattern for background modeling. The local pattern is fast to compute and is insensitive to random noise, scale transform of intensity. The pattern feature is also invariant to rotational transform. We also propose a novel scheme for matching a pixel with the background model within a spatio-temporal domain. Furthermore, we devise two feedback mechanisms for maintaining the quality of the result over a long video. First, the background model is updated immediately based on the background subtraction result. Second, the detected object is enhanced by adjustment of the segmentation conditions in proximity via a propagation scheme. We compare our method with state-of-the-art background/foreground segregation algorithms using various video datasets.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125361163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Fujita, Ken Sakurada, T. Imaizumi, R. Ito, S. Hikosaka, R. Nakamura
{"title":"Damage detection from aerial images via convolutional neural networks","authors":"A. Fujita, Ken Sakurada, T. Imaizumi, R. Ito, S. Hikosaka, R. Nakamura","doi":"10.23919/MVA.2017.7986759","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986759","url":null,"abstract":"This paper explores the effective use of Convolutional Neural Networks (CNNs) in the context of washed-away building detection from pre- and post-tsunami aerial images. To this end, we compile a dedicated, labeled aerial image dataset to construct models that classify whether a building is washed-away. Each datum in the set is a pair of pre- and post-tsunami image patches and encompasses a target building at the center of the patch. Using this dataset, we comprehensively evaluate CNNs from a practical-application viewpoint, e.g., input scenarios (pre-tsunami images are not always available), input scales (building size varies) and different configurations for CNNs. The experimental results show that our CNN-based washed-away detection system achieves 94–96% classification accuracy across all conditions, indicating the promising applicability of CNNs for washed-away building detection.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122985591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}