Jiamin Xu, P. Shivakumara, Tong Lu, C. Tan, M. Blumenstein
{"title":"Text detection in born-digital images by mass estimation","authors":"Jiamin Xu, P. Shivakumara, Tong Lu, C. Tan, M. Blumenstein","doi":"10.1109/ACPR.2015.7486591","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486591","url":null,"abstract":"There is a need for effective web-document understanding due to the explosive progress of internet and network technologies. In this paper, we propose a new method for text detection in born-digital images by introducing a mass estimation concept. We propose to explore super-pixel information of different color channels to identify text atoms in images. The proposed method uses similarity graphs and spectral clustering to identify candidate text regions. We propose a new idea of mapping Gabor responses of a candidate text region to a spatial circle to study the spatial coherency of pixels. We introduce a mass estimation concept to identify text candidates from the pixel distribution in a spatial circle. The linear linkage graphs help in grouping text candidates to obtain full text lines. The same Gabor responses are used as features to eliminate false positives with an SVM classifier. We evaluate the proposed method for the testing on standard datasets, such as ICDAR 2013 (challenge-1) and the Situ et al. dataset. Experimental results on both the datasets show that the proposed method outperforms the existing methods.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131737744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Chen, Song Wang, Wei-liang Fan, Jun Sun, S. Naoi
{"title":"Beyond human recognition: A CNN-based framework for handwritten character recognition","authors":"Li Chen, Song Wang, Wei-liang Fan, Jun Sun, S. Naoi","doi":"10.1109/ACPR.2015.7486592","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486592","url":null,"abstract":"Because of the various appearance (different writers, writing styles, noise, etc.), the handwritten character recognition is one of the most challenging task in pattern recognition. Through decades of research, the traditional method has reached its limit while the emergence of deep learning provides a new way to break this limit. In this paper, a CNN-based handwritten character recognition framework is proposed. In this framework, proper sample generation, training scheme and CNN network structure are employed according to the properties of handwritten characters. In the experiments, the proposed framework performed even better than human on handwritten digit (MNIST) and Chinese character (CASIA) recognition. The advantage of this framework is proved by these experimental results.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131785692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Shivakumara, Guozhu Liang, Sangheeta Roy, U. Pal, Tong Lu
{"title":"New texture-spatial features for keyword spotting in video images","authors":"P. Shivakumara, Guozhu Liang, Sangheeta Roy, U. Pal, Tong Lu","doi":"10.1109/ACPR.2015.7486532","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486532","url":null,"abstract":"Keyword spotting in video document images is challenging due to low resolution and complex background of video images. We propose the combination of Texture-Spatial-Features (TSF) for keyword spotting in video images without recognizing them. First, a segmentation method extracts words from text lines in each video image. Then we propose the set of texture features for identifying text candidates in the word image with the help of k-means clustering. The proposed method finds proximity between text candidates to study the spatial arrangement of pixels that result in feature vectors for spotting words in the input frame. The proposed method is evaluated on word images of different fonts, contrasts, backgrounds and font sizes, which are chosen from standard databases such as ICDAR 2013 video and our video data. Experimental results show that the proposed method outperforms the existing method in terms of recall, precision and f-measure.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133916734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian nonparametric inference of latent topic hierarchies for multimodal data","authors":"Takuji Shimamawari, K. Eguchi, A. Takasu","doi":"10.1109/ACPR.2015.7486501","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486501","url":null,"abstract":"Research on multimodal data analysis such as annotated image analysis is becoming more important than ever due to the increase in the amount of data. One of the approaches to this problem is multimodal topic models as an extension of latent Dirichlet allocation (LDA). Symmetric correspondence topic models (SymCorrLDA) are state-of-the-art multimodal topic models that can appropriately model multimodal data considering inter-modal dependencies. Incidentally, hierarchically structured categories can help users find relevant data from a large amount of data collection. Hierarchical topic models such as hierarchical latent Dirichlet allocation (hLDA) can discover a tree-structured hierarchy of latent topics from a given unimodal data collection; however, no hierarchical topic models can appropriately handle multimodal data considering intermodal mutual dependencies. In this paper, we propose h-SymCorrLDA to discover latent topic hierarchies from multimodal data by combining the ideas of the two previously mentioned models: multimodal topic models and hierarchical topic models. We demonstrate the effectiveness of our model compared with several baseline models through experiments with two datasets of annotated images.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122880004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video-level violence rating with rank prediction","authors":"Yu Wang, Jien Kato","doi":"10.1109/ACPR.2015.7486468","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486468","url":null,"abstract":"Given a video as input, our objective is to estimate a rate to describe \"how violent it is\". Such an estimation can be directly used in many practical applications, such like preventing children from violent videos. However, due to the unique property of the rating task, existing approaches on human action recognition and violent scenes detection can not be directly utilized. In this paper, we propose an approach that are specially developed for violence rating. The approach is featured with: (1) a novel video descriptor called Violent Attribute Activation (VAA) vector, which provides high level description on the properties of visual violence; and (2) a rank-prediction-based rating approach, which enforces the order constrains in the learning phase. The performance of our approach have been confirmed on a novel dataset that are prepared for violence rating.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124438955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Baharul Islam, L. Wong, Chee-Onn Wong, Kok-Lim Low
{"title":"Stereoscopic image warping for enhancing composition aesthetics","authors":"Md Baharul Islam, L. Wong, Chee-Onn Wong, Kok-Lim Low","doi":"10.1109/ACPR.2015.7486582","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486582","url":null,"abstract":"The increased popularity of stereo photography due to the availability of stereoscopic lens and cameras has aroused research interest in stereo image editing. In this paper, we present an automatic, aesthetic-based warping approach to recompose both the left and right stereo image pair simultaneously using a global optimization algorithm. To maximize image aesthetics, we minimize a set of aesthetics errors formulated based on selected photographic composition rules during the warping process. In addition, our algorithm attempts to preserve the stereoscopic properties by minimizing disparity change and vertical drift in the resulting image. Experimental results shows that our approach successfully relocate salient objects according to the selected photographic rules to enhance compositional aesthetics and maintain disparity consistency to create a comfortable 3D viewing experience.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121058904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient graph spanning structures for large database image retrieval","authors":"B. Mocanu, Ruxandra Tapu, T. Zaharia","doi":"10.1109/ACPR.2015.7486572","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486572","url":null,"abstract":"In this paper we propose a novel method to improve the performance of image retrieval at VLAD descriptor level. The system performs image re-ranking based on relational graphs and neighborhood relations of the top-k candidate results. The technique is able to treat differently various parts of the graph spanning structures by adaptively modifying the similarity score between images. Because most of the processing is performed offline, our algorithm does not influence the retrieval time. By dealing with uneven distribution of images in the dataset, the method is effective and increases the accuracy without relying on low-level information or on the geometrical verification of the considered features.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114445020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Koustav Ghosal, Ameya Prabhu, Riddhiman Dasgupta, A. Namboodiri
{"title":"Learning clustered sub-spaces for sketch-based image retrieval","authors":"Koustav Ghosal, Ameya Prabhu, Riddhiman Dasgupta, A. Namboodiri","doi":"10.1109/ACPR.2015.7486573","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486573","url":null,"abstract":"Most of the traditional sketch-based image retrieval systems compare sketches and images using morphological features. Since these features belong to two different modalities, they are compared either by reducing the image to a sparse sketch like form or by transforming the sketches to a denser image like representation. However, this cross-modal transformation leads to information loss or adds undesirable noise to the system. We propose a method, in which, instead of comparing the two modalities directly, a cross-modal correspondence is established between the images and sketches. Using an extended version of Canonical Correlation Analysis (CCA), the samples are projected onto a lower dimensional subspace, where the images and sketches of the same class are maximally correlated. We test the efficiency of our method on images from Caltech, PASCAL and sketches from TU-BERLIN dataset. Our results show significant improvement in retrieval performance with the cross-modal correspondence.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122084014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video-based object recognition with weakly supervised object localization","authors":"Yang Liu, R. Kouskouridas, Tae-Kyun Kim","doi":"10.1109/ACPR.2015.7486463","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486463","url":null,"abstract":"With the number of videos growing rapidly in modern society, automatically recognizing objects from video input becomes increasingly pressing. Videos contain abundant yet noisy information, with easily obtained video-level labels. This paper targets the problem of video-based object recognition, whilst keeping the advantages of videos. We propose a novel algorithm, which only utilizes the weak video-level label in training, iteratively updating the classifier and inferring the object location in each video frame. During testing we obtain more accurate recognition results by inferring the location of the object in the scene. The background and temporal information are also incorporated in the model to improve the discriminability and consistency of recognition in video. We introduce a novel and challenging YouTube dataset to demonstrate the benefits of our method over other baseline methods.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123229870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accent classification with phonetic vowel representation","authors":"Zhenhao Ge, Ying‐Ying Tan, A. Ganapathiraju","doi":"10.1109/ACPR.2015.7486559","DOIUrl":"https://doi.org/10.1109/ACPR.2015.7486559","url":null,"abstract":"Previous accent classification research focused mainly on detecting accents with pure acoustic information without recognizing accented speech. This work combines phonetic knowledge such as vowels with acoustic information to build Guassian Mixture Model (GMM) classifier with Perceptual Linear Predictive (PLP) features, optimized by Hetroscedastic Linear Discriminant Analysis (HLDA). With input about 20-second accented speech, this system achieves classification rate of 51% on a 7-way classification system focusing on the major types of accents in English, which is competitive to the state-of-the-art results in this field.","PeriodicalId":240902,"journal":{"name":"2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}