{"title":"PCA-LDANet: A Simple Feature Learning Method for Image Classification","authors":"Yukun Ge, Jiani Hu, Weihong Deng","doi":"10.1109/ACPR.2017.36","DOIUrl":"https://doi.org/10.1109/ACPR.2017.36","url":null,"abstract":"In this paper, we propose a simple and effective feature learning architecture for image classification that is based on very basic data processing components: 1) principal component analysis (PCA); 2) linear discriminant analysis (LDA); and 3) binary hashing and blockwise histograms. In this architecture, the PCA is employed to reconstruct patches of input images, and the LDA is employed to learn filter banks. This is followed by simple binary hashing and blockwise histograms for indexing. This architecture is motivated by LDANet and PCANet, thus called the PCA LDA Network (PCA-LDANet). They have some similarities in their topologies. We have tested the PCA-LDANet on two visual datasets for different tasks, including the Facial Recognition Technology (FERET) dataset for face recognition; and MNIST dataset for hand-written digit recognition. To explore the properties and essence of these architectures, we just conduct experiments on the one-stage networks. It is enough to explain the issue properly. Experimental results show that the PCA-LDANet-1 outperforms both PCANet-1 and LDANet-1 on both datasets. The experimental results demonstrate the effectiveness and distinctiveness of the PCA-LDANet; and the important role of PCA patch reconstruction in the PCA-LDANet.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128751931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Principal Orientations Descriptor for Action Recognition","authors":"Lei Chen, Jiwen Lu, Zhanjie Song, Jie Zhou","doi":"10.1109/ACPR.2017.28","DOIUrl":"https://doi.org/10.1109/ACPR.2017.28","url":null,"abstract":"In this paper, we propose an unsupervised learning based representation method, named principal orientations descriptor (POD), to describe the local and statistic characteristics for action recognition. Unlike hand-crafted features which require high prior knowledge, our POD is learned from raw pixels and reflects the distribution of principal orientations around the motion trajectories. Different from deep learning based features which are based on a large number of labeled data, our POD is learned in an unsupervised learning manner. We learn POD in the spatial domain and the temporal domain based on the same motion trajectories individually, which makes POD have the ability to describe both the spatial and the temporal information along the same trajectories. To evaluate the performance of POD, we conduct experiments on two challenging action datasets: Hollywood2 and HMDB51. The results show that our method is competitive to the existing methods.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121041776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tsubasa Hirakawa, Takayoshi Yamashita, K. Yoda, Toru Tamaki, H. Fujiyoshi
{"title":"Travel Time-Dependent Maximum Entropy Inverse Reinforcement Learning for Seabird Trajectory Prediction","authors":"Tsubasa Hirakawa, Takayoshi Yamashita, K. Yoda, Toru Tamaki, H. Fujiyoshi","doi":"10.1109/ACPR.2017.20","DOIUrl":"https://doi.org/10.1109/ACPR.2017.20","url":null,"abstract":"Trajectory prediction is a challenging problem in the fields of computer vision, robotics, and machine learning, and a number of methods for trajectory prediction have been proposed. Most methods generate trajectories that move toward a goal in a straight line (goal-directed) while avoiding obstacles. However, there are not only such goal-directed trajectories but also trajectories that taking detours to reach the goal (non-goal-directed). In this paper, we propose a method of predicting such non-goal-directed trajectories based on the maximum entropy inverse reinforcement learning framework. Our method introduces travel time as a state of the Markov decision process. As a practical example, we apply the proposed method to seabird trajectories measured using global positioning system loggers. Experimental results show that the proposed method can effectively predict non-goal-directed trajectories.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126298659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Feature Similarity for Generative Adversarial Networks","authors":"Xianxu Hou, Ke Sun, G. Qiu","doi":"10.1109/ACPR.2017.47","DOIUrl":"https://doi.org/10.1109/ACPR.2017.47","url":null,"abstract":"We propose a new way to train generative adversarial networks (GANs) based on pretrained deep convolutional neural network (CNN). Instead of directly using the generated images and the real images in pixel space, the corresponding deep features extracted from pretrained networks are used to train the generator and discriminator. We enforce the deep feature similarity of the generated and real images to stabilize the training and generate more natural visual images. Testing on face and flower image dataset, we show that the generated samples are clearer and have higher visual quality than traditional GANs. The human evaluation demonstrates that humans cannot easily distinguish the fake from real face images.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126915622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion Vector Based Data Association for On-Line Multi-object Tracking","authors":"Cong Ma, Z. Miao, Xiao-Ping Zhang, Min Li","doi":"10.1109/ACPR.2017.54","DOIUrl":"https://doi.org/10.1109/ACPR.2017.54","url":null,"abstract":"On-line multi-object tracking needs to solve the data association problem on each new frame in time-critical video analysis applications. However, associating the new detection responses and existing trajectories under the tracking-by-detection framework is faced with challenges such as mis-detections and false alarms. In order to build a more reliable frame-by-frame association with the given detection results in applications where precision is primarily required, we design a strong associating constraint based on motion vectors computed from uniformly sampled keypoints in the scene while considering spatial information at the same time. With the optical flow analysis between the two successive frames, we propose a new cost function for building the association matrix and solve the multi-object tracking problem in an on-line form. Experimental results on challenging benchmark datasets show that our method achieves overall state-of-the-art performance, especially effective in reducing false alarms.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127426543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Tian, Cairong Zhao, Kang Chen, Yipeng Chen, Zhihua Wei, D. Miao
{"title":"Discriminative Transfer Learning Siamese CNN for Person Re-identification","authors":"Yuan Tian, Cairong Zhao, Kang Chen, Yipeng Chen, Zhihua Wei, D. Miao","doi":"10.1109/ACPR.2017.119","DOIUrl":"https://doi.org/10.1109/ACPR.2017.119","url":null,"abstract":"Person re-identification (Re-ID) has become an increasingly popular computer vision problem. It remains challenging, especially when there are non-overlapping cameras. In this paper, we review the two representative architecture, i.e., identification and verification models. They both have their advancements and limitations. We present a novel method to address the Re-ID problem. First, combine the two models to consist a more effective fusion loss function. Second, we find that CNNs which are pre-trained on large image datasets learn more discriminative knowledge with objective semantic, which can be transferred to subsequent layers to promote accuracy significantly. Experiments on four benchmark datasets show the superiority of our method over the state-of-the-art alternatives.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115528373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kezhou Chen, N. Sang, Zhiqiang Li, Changxin Gao, Ruolin Wang
{"title":"Re-ranking Person Re-identification with Local Discriminative Information","authors":"Kezhou Chen, N. Sang, Zhiqiang Li, Changxin Gao, Ruolin Wang","doi":"10.1109/ACPR.2017.1","DOIUrl":"https://doi.org/10.1109/ACPR.2017.1","url":null,"abstract":"Most existing metric learning based person reidentification methods try to learn a global distance metric to measure the similarity between person images. But owing to the large intra-class variations, pedestrian data follows very irregular distribution in the feature space. The global metric model can hardly exploit the discriminative information from local distribution. Thus, due to the higher similarity of distribution, local information should be elaborately mined and exploited to improve the matching accuracy, especially for some hard positive images. In this paper, we propose to combine the global metric and local information to resolve failure matching cases. Detailly, for a testing pair, positive pairs in the training set whose feature differences are similar with given testing pair under global metric are firstly searched. If most of these positive pairs are located in the local range of the testing pair, the global metric is thus believed to reflect the similarity relationship in this local area. According to the degree of local discriminative information being represented in global metric, testing pair is derived based on the global metric as well as the given pair's local information. Finally, all gallery images are re-ranked according to the combined similarity scores. Experimental results on VIPeR, PRID450S and Market-1501 datasets clearly demonstrate the effectiveness of the proposed method.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"448 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114096855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Speaker Naming via Deep Audio-Face Fusion and End-to-End Attention Model","authors":"Xin Liu, Jiajia Geng, Haibin Ling","doi":"10.1109/ACPR.2017.13","DOIUrl":"https://doi.org/10.1109/ACPR.2017.13","url":null,"abstract":"Speaker naming has recently received wide attention in identifying the speaking character in a movie video, and it is an extremely challenging topic mainly attributed to the significant variation of facial appearance. Motivated by multimodal applications, we present an efficient speaker naming approach via deep audio-face fusion and end-to-end attention model. First, we start with LSTM-encoding of acoustic feature and VGG-encoding of face images, and then exploit an end-to-end common attention vector by convolution-softmax encoding of their locally concatenated features, whereby the face attention vector can be well discriminated. Further, we apply the low-rank bilinear model to efficiently fuse the face attention vector and acoustic feature vector, whereby the joint audio-face representation can be discriminatively obtained for speaker naming. In addition, we address another acoustic feature representation scheme by convolution-encoding, which can replace LSTM in networks to speed up the training process. The experimental results have shown that our proposed speaker naming approach yields comparative and even better results than the state-of-the-art counterparts.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124188886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning a Smart Convolutional Neural Network with High-Level Semantic Information","authors":"Xinshu Qiao, Chunyan Xu, Jian Yang, Jiatao Jiang","doi":"10.1109/ACPR.2017.87","DOIUrl":"https://doi.org/10.1109/ACPR.2017.87","url":null,"abstract":"With the wide application of big data and the development of computer computing capability, deep Convolutional Neural Network (CNN) has been widely applied in the field of computer vision. The current architecture of deep neural network is becoming deeper and more complex for achieving a better performance. However, their natural disadvantages such as larger consumption of computation or memory, and longer run-time make CNN models difficult to be applied to the mobile and embedded devices. In this paper, we learn a Smart Convolutional Neural Network (S-CNN) under the guide of neurons' high-level semantic information distilled from a cumbersome neural network. S-CNN can be seen as an improved CNN model, which is with less consumption of computation and memory in the predicted process. We verify the superiority of S-CNN in terms of image classification task on three benchmarking datasets, including CIFAR-10, CIFAR-100 and SVHN. Experimental results clearly demonstrate that the proposed S-CNN can get an exciting performance compared with traditional CNN models.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121284069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Shivakumara, Aishik Konwer, A. Bhowmick, Vijeta Khare, U. Pal, Tong Lu
{"title":"A New GVF Arrow Pattern for Character Segmentation from Double Line License Plate Images","authors":"P. Shivakumara, Aishik Konwer, A. Bhowmick, Vijeta Khare, U. Pal, Tong Lu","doi":"10.1109/ACPR.2017.45","DOIUrl":"https://doi.org/10.1109/ACPR.2017.45","url":null,"abstract":"License plate recognition is a live problem for several developing countries because of its many challenges. One of such challenges is character segmentation from double lines (alphabets in one line and numerals on another line) license plate images, where we can see touching between adjacent characters (horizontally) and lines (vertically). This is the major cause for poor recognition performance. Therefore, we propose a novel technique based on Gradient Vector Flow (GVF) to segment characters from double line license plate images. The proposed technique explores a new GVF arrow pattern, which represents spaces between lines and characters based on the fact that the force in concavity created between characters and lines according to the fact that curved shaped characters attract GVF arrows in unique fashion. This observation leads to find seed space patches for segmentation. The spatial coordinates of seed space patches are passed through Hough transform to find line separators. Next, the proposed technique searches for seed space patches, which are perpendicular to line separators to find character separators. Experimental results on double line license plate images show that the proposed technique is robust to touching, rotations, scaling, distortion, and outperforms the existing character segmentation methods. The recognition experiments before and after segmentation show that the proposed segmentation is significant in improving license plate recognition rate.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"1 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123300918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}