{"title":"Marginal Deep Architectures","authors":"G. Zhong, Hongxu Wei, Yuchen Zheng, Junyu Dong","doi":"10.1109/ACPR.2017.88","DOIUrl":"https://doi.org/10.1109/ACPR.2017.88","url":null,"abstract":"Many deep architectures have been proposed in recent years. To obtain good results, most of the previous deep models need a large number of training data. In this paper, for small and middle scale applications, we propose a novel deep learning framework based on stacked feature learning models. Particularly, we stack marginal Fisher analysis (MFA) layer by layer for the initialization of the deep architecture and call it \"Marginal Deep Architectures\" (MDA). In the implementation of MDA, the weight matrices of MFA are first learned layer by layer, and then some deep learning techniques are employed to fine tune the deep architecture. To evaluate the effectiveness of MDA, we have compared it with related feature learning methods and deep learning models on six small and middle scale real-world applications. Extensive experiments demonstrate that MDA performs not only better than shallow feature learning models, but also deep learning models in these applications.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121611949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Face Recognition under Eyeglass and Scale Variation Using Extended Siamese Network","authors":"Fan Qiu, S. Kamata, Lizhuang Ma","doi":"10.1109/ACPR.2017.48","DOIUrl":"https://doi.org/10.1109/ACPR.2017.48","url":null,"abstract":"Face recognition has attracted much attention from researchers for past decades. Recently, with the development of deep learning, a deep neural network is adopted by face recognition system and better performance is obtained. Many works on metric learning have been done in the deep neural network. Meanwhile, there are several variation problems existing in face recognition, such as profile face image, low-resolution face image, different age of face image, face image wearing eyeglass, etc. In this paper, targeting at different kinds of variation problems, we proposed a novel network structure, called Extended Siamese Network. Another contribution is that a new loss function is proposed, to further take inter-class information into account based on the center loss function. The experiments show that recognition accuracy is improved in comparison with the other state-of-art methods.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114673833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Jointly Sparse Regression for Image Feature Selection","authors":"Dongmei Mo, Zhihui Lai","doi":"10.1109/ACPR.2017.49","DOIUrl":"https://doi.org/10.1109/ACPR.2017.49","url":null,"abstract":"In this paper, we proposed a novel model called Robust Jointly Sparse Regression (RJSR) for image feature selection. In the proposed model, the L21-norm based loss function is robust to outliers and the L21-norm regularization term guarantees the joint sparsity for feature selection. In addition, the model can solve the small-class problem in the regression-based methods or the LDA-based methods. Comparing with the traditional L21-norm minimization based methods, the proposed method is more robust to noise since the flexible factor and the robust measurement are incorporated into the model to perform feature extraction and selection. An alternatively iterative algorithm is designed to compute the optimal solution. Experimental evaluation on several well-known data sets shows the merits of the proposed method on feature selection and classification, especially in the case when the face image is corrupted by block noise.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124063979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingxie Zheng, K. Tsuji, Nobuhiro Miyazaki, Yuji Matsuda, Takayuki Baba, E. Segawa, Y. Uehara
{"title":"Privacy-Conscious Person Re-identification Using Low-Resolution Videos","authors":"Mingxie Zheng, K. Tsuji, Nobuhiro Miyazaki, Yuji Matsuda, Takayuki Baba, E. Segawa, Y. Uehara","doi":"10.1109/ACPR.2017.46","DOIUrl":"https://doi.org/10.1109/ACPR.2017.46","url":null,"abstract":"This paper proposes a person re-identification method for obtaining human flow information from low-resolution video generated by surveillance cameras. A requisite for the use of cameras in public spaces is protection of the privacy of individuals appearing in the captured videos. Thus, low-resolution videos (e.g. head sizes are 3-8 pixels) are expected to solve the problem of privacy, which make faces unrecognizable. However, person re-identification is more difficult in low-resolution videos than in high-resolution videos. The reason is that the person-occupied region consists of fewer pixels and has less information. Our proposed method re-identifies a person using the color features extracted from broad regions, which we consider as the most basic and important features for low-resolution videos. The color feature extraction is based on vertical relationships such as a person's head and his/her clothing because those are kept in low-resolution videos. In addition, we select the common color features, which do not change significantly between cameras. In an evaluation experiment with low-resolution videos, the re-identification accuracy of the proposed method is 71%, which is equivalent to that of manual re-identification from low-resolution videos.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115440417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Enhanced Competitive Code with Compacted ST for 3D Palmprint Recognition","authors":"Lunke Fei, Shaohua Teng, Wu Jigang, Yong Xu, Jie Wen, Chunwei Tian","doi":"10.1109/ACPR.2017.62","DOIUrl":"https://doi.org/10.1109/ACPR.2017.62","url":null,"abstract":"As one of important biometric traits, three dimensional (3D) palmprint has recently drawn considerable research interest in the field of palmprint-based authentication. Because 3D palmprint images have rich depth information and are difficult to be counterfeited. In this paper, a novel enhanced competitive code (Ecomp) is proposed to effectively represent the orientation features of palmprint by emphasizing the significance of the orientation, and a simple and effective compact-surface-type (cST) is used to describe the surface structures of 3D palmprint. The addition and multiplication schemes are respectively proposed to effectively combine the Ecomp and cST maps, and the proposed descriptors can better represent not only the 2D orientation but also the 3D surface shapes of the 3D palmprint. Experimental results on the widely used 3D palmprint database are presented to demonstrate the effectiveness of the proposed method on both 3D palmprint verification and identification.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"387 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123530299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Wang, Songsong Wu, Guangwei Gao, Quan Zhou, Xiaoyuan Jing
{"title":"Learning Autoencoder of Attribute Constraint for Zero-Shot Classification","authors":"Kun Wang, Songsong Wu, Guangwei Gao, Quan Zhou, Xiaoyuan Jing","doi":"10.1109/ACPR.2017.129","DOIUrl":"https://doi.org/10.1109/ACPR.2017.129","url":null,"abstract":"The goal of zero-shot classification (ZSC) isto classify target classes precisely based on learning asemantic mapping from a feature space to a semanticknowledge space. However, the learned semantic mappingis only concerned with predicting source classes. Applyingthe semantic mapping to target classes directly will sufferfrom the semantic shift problem. In this paper, we proposea novel method called autoencoder of attribute constraint(AOAC) to settle this problem. In AOAC, we adopt theencoder-decoder paradigm to learn the semantic mapping.Additionally, we take the inaccurate attributes of sourceimages into consideration and generate virtual data to solveit. The experimental results on two challenging datasetsshow that our proposed AOAC can resolve the semanticshift problem effectively and also improve the computationalspeed significantly.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129573396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Words Speak for Actions: Using Text to Find Video Highlights","authors":"Sukanya Kudi, A. Namboodiri","doi":"10.1109/ACPR.2017.141","DOIUrl":"https://doi.org/10.1109/ACPR.2017.141","url":null,"abstract":"Video highlights are a selection of the most interesting parts of a video. The problem of highlight detection has been explored for video domains like egocentric, sports, movies, and surveillance videos. Existing methods are limited to finding visually important parts of the video but does not necessarily learn semantics. Moreover, the available benchmark datasets contain audio muted, single activity, short videos, which lack any context apart from a few keyframes that can be used to understand them. In this work, we explore highlight detection in the TV series domain, which features complex interactions with the surroundings. The existing methods would fare poorly in capturing the video semantics in such videos. To incorporate the importance of dialogues/audio, we propose using the descriptions of shots of the video as cues to learning visual importance. Note that while the audio information is used to determine visual importance during training, the highlight detection still works using only the visual information from videos. We use publicly available text ranking algorithms to rank the descriptions. The ranking scores are used to train a visual pairwise shot ranking model (VPSR) to find the highlights of the video. The results are reported on TV series videos of the VideoSet dataset and a season of Buffy the Vampire Slayer TV series.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Sparse Adversarial Dictionaries for Multi-class Audio Classification","authors":"Vaisakh Shaj, Puranjoy Bhattacharya","doi":"10.1109/ACPR.2017.137","DOIUrl":"https://doi.org/10.1109/ACPR.2017.137","url":null,"abstract":"Audio events are quite often overlapping in nature, and more prone to noise than visual signals. There has been increasing evidence for the superior performance of representations learned using sparse dictionaries for applications like audio denoising and speech enhancement. This paper concentrates on modifying the traditional reconstructive dictionary learning algorithms, by incorporating a discriminative term into the objective function inorder to learn class specific adversarial dictionaries that are good at representing samples of their own class at the same time poor at representing samples belonging to any other class. We quantitatively demonstrate the effectiveness of our learned dictionaries as a stand-alone solution for both binary as well as multi-class audio classification problems.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128954795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungwon Shin, Dongkyu Kim, Homin Park, Byungkon Kang, Kyung-ah Sohn
{"title":"Finding Compact Class Sets for Korean Font Image Classification","authors":"Seungwon Shin, Dongkyu Kim, Homin Park, Byungkon Kang, Kyung-ah Sohn","doi":"10.1109/ACPR.2017.97","DOIUrl":"https://doi.org/10.1109/ACPR.2017.97","url":null,"abstract":"We address the problem of finding compact class sets for Korean font images taken under natural and noisy circumstances. Korean font images are prone to misclassification due to the similar, yet subtly different visual characteristics. The classification becomes even more confusing when the images are subject to various pixel-wise or affine translations, such as scaling and shear mapping. We argue that many font class divisions are inherently flawed in the sense that the fonts are divided in an overly-fine manner. To tackle this issue, we propose a system that discovers compact class sets, based on the confusion matrix of the initial classifier. We demonstrate that grouping existing classes into new ones increases the classification accuracy of Korean fonts, and also results in qualitatively intuitive new classes.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129855473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Text and Symbol Extraction in Traffic Panel from Natural Scene Images","authors":"Zhen-Mao Li, Lin-Lin Huang","doi":"10.1109/ACPR.2017.71","DOIUrl":"https://doi.org/10.1109/ACPR.2017.71","url":null,"abstract":"Traffic panels contain rich text and symbolic information for transportation and scene understanding. In order to understand the information in panels, fast and robust extraction of the text and symbol is a crucial and essential step. This problem cannot be solved using generic scene text detection methods due to the special layout characteristics, especially in Chinese panels. In this paper, we propose a fast and robust approach for Chinese text and symbol extraction in traffic panels from natural scene images. Given a traffic panel in natural scene, Contrasting Extremal Region (CER) algorithm is applied to extract character candidates which are further filtered by boosting classifier using Histogram Orientation Gradient Features. Since Chinese characters often consist of multiple isolated strokes, a hierarchical clustering process of stroke components is carried out to group isolated strokes into characters using the detected characters as seeds. Next, the Chinese text lines are formed by Distance Metric Learning (DSL) method. In consideration that traffic symbols do not possibly appear in the location of texts, symbols are extracted using two stages boosting classifier after text detection. Experimental results on real traffic images from Baidu Street View demonstrate the effectiveness of the proposed method.","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125350615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}