Ryutaro Kitagawa, Yoshihiko Mochizuki, S. Iizuka, E. Simo-Serra, Hiroshi Matsuki, N. Natori, H. Ishikawa
{"title":"Banknote portrait detection using convolutional neural network","authors":"Ryutaro Kitagawa, Yoshihiko Mochizuki, S. Iizuka, E. Simo-Serra, Hiroshi Matsuki, N. Natori, H. Ishikawa","doi":"10.23919/MVA.2017.7986895","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986895","url":null,"abstract":"Banknotes generally have different designs according to their denominations. Thus, if characteristics of each design can be recognized, they can be used for sorting banknotes according to denominations. Portrait in banknotes is one such characteristic that can be used for classification. A sorting system for banknotes can be designed that recognizes portraits in each banknote and sort it accordingly. In this paper, our aim is to automate the configuration of such a sorting system by automatically detect portraits in sample banknotes, so that it can be quickly deployed in a new target country. We use Convolutional Neural Networks to detect portraits in completely new set of banknotes robust to variation in the ways they are shown, such as the size and the orientation of the face.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126669113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust registration of serial cell microscopic images using 3D Hilbert scan search","authors":"Yongwen Lai, S. Kamata, Zhizhong Fu","doi":"10.23919/MVA.2017.7986917","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986917","url":null,"abstract":"Microscopic images are quite helpful for us to observe the details of cells because of its high resolution. Furthermore it can benefit biologists and doctors to view the cell structure from any aspect by using a serial images to generate 3D cell structure. However each cell slice is placed at the microscopy respectively, which will bring in the arbitrary rotation and translation among the serial slices. What's more, the sectioning process will destroy the cell structure such as tearing or warping. Therefore we must register the serial slices before rendering the volume data in 3D. In this paper we propose a robust registration algorithm based on an improved 3D Hilbert scam search. Besides we put forward a simple but effective method to remove false matching in consecutive images. Finally we correct the local deformation based on optical-flow theory and adopt multi-resolution method. Our algorithm is tested, on a serial microscopy kidney cell images, and the experimental results show how accurate and robust of our method is.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pedestrian near-miss analysis on vehicle-mounted driving recorders","authors":"Teppei Suzuki, Y. Aoki, Hirokatsu Kataoka","doi":"10.23919/MVA.2017.7986889","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986889","url":null,"abstract":"Recently, a demand for video analysis on vehicle-mounted driving recorders has been increasing in vision-based safety systems, such as for autonomous vehicles. The technology must be positioned one of the most important task, however, the conventional traffic datasets (e.g. KITTI, Caltech Pedestrian) are not included any dangerous scenes (near-miss scenes), even though the objective of a safety system is to avoid danger. In this paper, (i) we create a pedestrian near-miss dataset on vehicle-mounted driving recorders and (ii) propose a method to jointly learns to predict pedestrian detection and its danger level {high, low, no-danger} with convolutional neural networks (CNN) based on the ResNets. According to the result, we demonstrate the effectiveness of our approach that achieved 68% accuracy of joint pedestrian detection and danger label prediction, and 58.6fps processing time on the self-collected pedestrian near-miss dataset.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133942421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ball-like observation model and multi-peak distribution estimation based particle filter for 3D Ping-pong ball tracking","authors":"Ziwei Deng, Xina Cheng, T. Ikenaga","doi":"10.23919/MVA.2017.7986883","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986883","url":null,"abstract":"3D ball tracking is of great significance to ping-pong game analysis, which can be utilized to applications such as TV content and tactic analysis. To achieve a high success rate in ping-pong ball tracking, the main problems are the lack of unique features and the complexity of background, which make it difficult to distinguish the ball from similar noises. This paper proposes a ball-like observation model and a multi-peak distribution estimation to improve accuracy. For the balllike observation model, we utilize gradient feature from the edge of upper semicircle to construct a histogram, besides, ball-size likelihood is proposed to deal with the situation when noises are different in size with the ball. The multi-peak distribution estimation aims at obtaining a precise ball position in case the partidles' weight distribution has multiple peaks. Experiments are based on ping-pong videos recorded in an official match from 4 perspectives, which in total have 122 hit cases with 2 pairs of players. The tracking success rate finally reaches 99.33%.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127521618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei-Cheng Wang, P. Chung, Chun-Rong Huang, Wei-Yun Huang
{"title":"Event based surveillance video synopsis using trajectory kinematics descriptors","authors":"Wei-Cheng Wang, P. Chung, Chun-Rong Huang, Wei-Yun Huang","doi":"10.23919/MVA.2017.7986848","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986848","url":null,"abstract":"Video synopsis has been shown its promising performance in visual surveillance, but the rearranged foreground objects may disorderly occlude to each other which makes end users hard to identify the targets. In this paper, a novel event based video synopsis method is proposed by using the clustering results of trajectories of foreground objects. To represent the kinematic events of each trajectory, trajectory kinematics descriptors are applied. Then, affinity propagation is used to cluster trajectories with similar kinematic events. Finally, each kinematic event group is used to generate an event based synopsis video. As shown in the experiments, the generated event based synopsis videos can effectively and efficiently reduce the lengths of the surveillance videos and are much clear for browsing compared to the states-of-the-art video synopsis methods.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125150193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixture particle filter with block jump biomechanics constraint for volleyball players lower body parts tracking","authors":"Fanglu Xie, Xina Cheng, T. Ikenaga","doi":"10.23919/MVA.2017.7986861","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986861","url":null,"abstract":"Volleyball player body parts tracking is very important for block or jump height calculation which can be applied to TV contents and tactical analysis. This paper proposes a mixture particle filter with block jump biomechanics constraint based on 3D articulated human model. Using mixture particle filters tracking different body parts can effectively reduce the freedom degree of the human model and make each particle filter track the specific target more accurately. Block jump biomechanics constraint executes adaptive prediction model and likelihood model which can make the particle filter specific for block tracking. The experiments are based on videos of the Final Game of 2014 Japan Inter High School Games of Men's Volleyball in Tokyo. The tracking success rate reached 93.9% for left foot and 93.8% for right foot.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"75 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114132702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA implementation of high frame rate and ultra-low delay vision system with local and global parallel based matching","authors":"Tingting Hu, T. Ikenaga","doi":"10.23919/MVA.2017.7986857","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986857","url":null,"abstract":"High frame rate and ultra-low delay image processing system plays an increasingly important role in human-machine interactive applications which call for a better experience. Current works based on vision chip target on video with simple patterns or simple shapes in order to get a higher speed, while a more complicated system is required for real-life applications. This paper proposes a BRIEF based matching system with high frame rate and ultra-low delay for specific object tracking, implemented on FPGA board. Local parallel and global pipeline based matching and 4-1-4 thread transformation are proposed for the implementation of this system. Local parallel and global pipeline based matching is proposed for high-speed matching. And 4-1-4 thread transformation is proposed to reduce the enormous resource cost caused by highly paralled and pipelined structure. In a broader framework, the proposed image processing system is made parallelized and pipelined for a high throughput which can meet the high frame rate and ultra-low delay system's demand. Evaluation results show that the proposed image processing core can work at 1306fps and 0.808ms delay with the resolution of 640×480. System using the image processing core and a camera with 784fps frame rate and 640×480 resolution is designed.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131799079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic extraction and recognition of shoe logos with a wide variety of appearance","authors":"Kazunori Aoki, W. Ohyama, T. Wakabayashi","doi":"10.23919/MVA.2017.7986838","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986838","url":null,"abstract":"A logo is a symbolic presentation that is designed not only to identify a product manufacturer but also to attract the attention of shoppers. Shoe logos are a challenging subject for automatic extraction and recognition using image analysis techniques because they have characteristics that distinguish them from those of other products, that is, there is much variation in the appearance of shoe logos. In this paper, we propose an automatic extraction and recognition method for shoe logos with a wide variety of appearanee using a limited number training samples. The proposed method employs maximally stable extremal regions (MSERs) for the initial region extraction, an iterative algorithm for region grouping, and gradient features and a support vector machine for logo recognition. The results of performance evaluation experiments using a logo dataset that consists of a wide variety of appearance show that the proposed method achieves promising performance for both logo extraction and recognition.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115361272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pradeep Kumar, Rajkumar Saini, S. Behera, D. P. Dogra, P. Roy
{"title":"Real-time recognition of sign language gestures and air-writing using leap motion","authors":"Pradeep Kumar, Rajkumar Saini, S. Behera, D. P. Dogra, P. Roy","doi":"10.23919/MVA.2017.7986825","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986825","url":null,"abstract":"A sign language is generally composed of three main parts, namely manual signas that are gestures made by hand or fingers movements, non-manual signs such as facial expressions or body postures, and finger-spelling where words are spelt out using gestures by the signers to convey the meaning. In literature, researchers have proposed various Sign Language Recognition (SLR) systems by focusing only one part of the sign language. However, combination of different parts has not been explored much. In this paper, we present a framework to recognize manual signs and finger spellings using Leap motion sensor. In the first phase, Support Vector Machine (SVM) classifier has been used to differentiate between manual and finger spelling gestures. Next, two BLSTM-NN classifiers are used for the recognition of manual signs and finger-spelling gestures using sequence-classification and sequence-transcription based approaches, respectively. A dataset of 2240 sign gestures consisting of 28 isolated manual signs and 28 finger-spelling words, has been recorded involving 10 users. We have obtained an overall accuracy of 63.57% in real-time recognition of sign gestures.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126830850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Gao, M. Ziegler, Frederik Zilly, Sandro Esquivel, R. Koch
{"title":"A linear method for recovering the depth of Ultra HD cameras using a kinect V2 sensor","authors":"Yuan Gao, M. Ziegler, Frederik Zilly, Sandro Esquivel, R. Koch","doi":"10.23919/MVA.2017.7986908","DOIUrl":"https://doi.org/10.23919/MVA.2017.7986908","url":null,"abstract":"Depth-Image-Based Rendering (DIBR) is a mature and important method for making free-viewpoint videos. As for the study of the DIBR approach, on the one hand, most of current research focuses on how to use it in systems with low resolution cameras, while a lot of Ultra HD rendering devices have been launched into markets. On the other hand, the quality and accuracy of the depth image directly affects the final rendering result. Therefore, in this paper we try to make some improvements on solving the problem of recovering the depth information for Ultra HD cameras with the help of a Kinect V2 sensor. To this end, a linear least squares method is proposed, which recovers the rigid transformation between a Kinect V2 and an Ultra HD camera, using the depth information from the Kinect V2 sensor. In addition, a non-linear coarse-to-fine method, which is based on Sparse Bundle Adjustment (SBA), is compared with this linear method. Experiments show that our proposed method performs better than the non-linear method for the Ultra HD depth image recovery both in computing time and precision.","PeriodicalId":193716,"journal":{"name":"2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA)","volume":"934 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116429471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}