{"title":"Abnormal behavior detection using Conditional Random Fields","authors":"Ben-Syuan Huang, Shih-Chung Hsu, Chung-Lin Huang","doi":"10.1109/ICALIP.2016.7846542","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846542","url":null,"abstract":"This paper proposes a real-time abnormal behavior detection using Conditional Random Fields(CRFs). A normal behavior can be characterized by the spatial and temporal features obtained from the video of human activities. The difficult of abnormal behavior detection is that human behavior varies in both motion and appearance. It is a continuous action stream, interspersed with transitional activities between abnormal and normal events. Here, we propose Bag of Words (BoWs) to describe the motion information as the observations. Then, we apply the CRFs and adaptive thresholding to identify the abnormal behaviors. Different from previous methods, our method can identify the undefined and unknown abnormal activities.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116426182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Zhang, Xiaoqiang Li, Wei Li, Peizhong Lu, Wenqiang Zhang
{"title":"A novel i-vector framework using multiple features and PCA for speaker recognition in short speech condition","authors":"Chi Zhang, Xiaoqiang Li, Wei Li, Peizhong Lu, Wenqiang Zhang","doi":"10.1109/ICALIP.2016.7846558","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846558","url":null,"abstract":"Speaker recognition in short speech condition is a difficult topic because the length of training and test speech is very short. One of the main disadvantage of the existing methods for speaker recognition is that they need very sufficient data and it's usually impossible in reality applications. In our experiments, the conventional methods with single feature don't make good performance in short speech. We propose a novel i-vector framework using multiple features and Principal Component Analysis (PCA) in short speech condition to overcome this difficulty, as multiple features combination can represent more aspects of a speaker. PCA is used to map the multiple features to an uncorrelated and orthogonal basis set to meet the requirements of Gaussian Mixture Model (GMM) with diagonal covariance matrices and i-vector. Improvement from the proposed approach compared to a state-of-the-art system are of roughly 50% relative at equal error rate when evaluated on the telephone conditions from the 2010 NIST speaker recognition evaluation (SRE).","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114574884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mai Haiming, Jiang Jianliang, Xie Bosun, Rao Dan, Liu Yang
{"title":"Influence of the number of loudspeakers on the timbre in horizontal Ambisonics reproduction","authors":"Mai Haiming, Jiang Jianliang, Xie Bosun, Rao Dan, Liu Yang","doi":"10.1109/ICALIP.2016.7846604","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846604","url":null,"abstract":"Ambisonics is a series of flexible spatial sound systems based on spatial harmonics decomposition of sound field. It is able to reconstruct sound field accurately within some region and below some frequency limit which are imposed by Shannon-Nyquist spatial sampling theorem and determined by the order of Ambisonics. Above the Shannon-Nyquist limit of spatial sampling, errors occur in the reconstructed sound field, resulting audible artifacts. By using Moore's revised loudness model, the present work analyzes the influence of number of loudspeakers on timbre in horizontal Ambisonics reproduction. The binaural loudness level spectra (BLLS) of Ambisonics reproduction are calculated and then compared with those of target sound field. The results indicate that below the Shannon-Nyquist limit of spatial sampling, the BLLS for Ambisonics reproduction match well with those of target sound field and thus no timbre change occurs. Above the limit, however, the BLLS for Ambisonics reproduction deviate from those of target sound field. The extent of deviation depends on both the direction of target sound field and the number of loudspeakers. Increasing the number of loudspeakers in Ambisonics reproduction may increase the change of BLLS in some case, but may reduce the change of BLLS in some other case.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114653625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An optimized license plate recognition system for complex situations","authors":"Jianing Qiu, Naida Zhu, Yi Wei, Xiaoqing Yu","doi":"10.1109/ICALIP.2016.7846647","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846647","url":null,"abstract":"This paper optimizes traditional license plate recognition system (LPRS) and for each original part with deficiencies, we give our own novel methods to refine them, such as integrating color and edge detection to increase the success rate of locating LPs as well as employing connected component analysis and vertical projection method alternatively to make segmentation more precise and efficient. For character recognition, we apply improved K-Nearest Neighbors algorithm and introduce some novel feature vectors to improve the accuracy of recognition. Our experimental results indicate that the optimized system has a high LP recognition rate with the accuracy of 96.75%.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114715679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on SPA-based target acoustic signal trend term removal","authors":"Ding Kai, Li Hao, Zhu Yichao, Qiu Shuang","doi":"10.1109/ICALIP.2016.7846568","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846568","url":null,"abstract":"When intelligent fuse is detecting target acoustic signal, affected by external interference and other factors, the signal may bring on trend term. Existence of trend term may cause direct influence on the accuracy of target acoustic signal, so that it should be removed. In allusion to the problems in existing trend term removal methods, such as trend term type should be assumed in advance, and the calculation process is quite complicated, a smoothness priors approach based method is proposed to remove trend term of target acoustic signal. 3 categories of target acoustic signals were processed in the research. According to the results, the approach is effective in removing trend term of time domain waveform and power spectrum of target acoustic signal. Moreover, the method is simple and efficient, and is applicable to the trend term removal preprocessing of target sound signal.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134139521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compressive-signal annotation driven by a supervised topic-clustering BoF model","authors":"J. Zheng, Lihong Ma, Xiaoer Wang","doi":"10.1109/ICALIP.2016.7846580","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846580","url":null,"abstract":"This paper presents a new Bag-of-Features model (BoF) to enhance the efficiency of automatic image annotation. Since the traditional BoF ignores the semantic of its vocabularies, it cannot be seen as descriptive representation of images in many image applications. To handle this critical limitation, firstly, we propose the RGB compressive texton. By using compressive sensing theory, the image can be compressed and its key information can be kept. Secondly, according to the topic of images, we extract RGB compressive texton from image of the same topic. Thirdly, the cluster algorithm is use to form clustering centers of each topic. Finally, using all topics cluster center to form new visual vocabularies of BoF model. Therefore each vocabulary has its semantics, which includes the topic information of images. We refer to such new BoF model as supervised topic-clustering BoF model. Experiments on automatic image annotation with a benchmark datasets Corel-5K show promising performance.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116795421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gesture recognition based on parallel hardware neural network implemented with stochastic logics","authors":"Xuechun Wang, Wendong Chen, Yuan Ji, F. Ran","doi":"10.1109/ICALIP.2016.7846555","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846555","url":null,"abstract":"A new method based on neural network using stochastic computing is presented for the recognition of human gesture. In the current gesture recognition study, most of the technologies require high hardware resources and power consumption. Considering gesture recognition algorithms, the power limitations of their complex systems have encouraged designers toward searching for a reconfigurable architecture, stochastic computing. For different neural networks with complex arithmetic operations, computation on stochastic bit streams costs fewer resources and performs very efficient in operation. The experimental results demonstrate that the stochastic neural network could recognize different hand gesture effectively and take less hardware area. Even more, it has good robustness to the different environments.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124839347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Person re-identification using multiple features fusion","authors":"Kang Han, W. Wan, Guoliang Chen, Li Hou","doi":"10.1109/ICALIP.2016.7846660","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846660","url":null,"abstract":"In this paper, we propose combined visual features for person re-identification. Our features are based on the multiple hand-crafted visual features. The proposed features are a combination of histogram from the RGB, YUV and HSV color channels, LBP and SIFT features. Then we use different distance metric learning methods to measure the similarity of the same persons and different persons. Experimental results demonstrate that the combined features have discriminative power for person re-identification.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125471500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Path planning algorithm under specific constraints in weighted directed graph","authors":"Qiyun Sun, W. Wan, Guoliang Chen, Xiang Feng","doi":"10.1109/ICALIP.2016.7846631","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846631","url":null,"abstract":"At present, most of the path planning algorithms are only aimed at reaching the end point from the starting point. However, in practical applications, the existence of a variety of constraints increases the difficulty of path planning , so the Dijkstra algorithm, A* algorithm and other classical algorithms become no longer applicable in such a situation. In this paper, considering the various constraints in practical problems, we abstracted out such kind of path planning problem: In a weighted directed graph, paths need to be found which start from a source node, after passing through some specified intermediate nodes without repetition, and finally ends at a specified termination node. Based on Dijkstra algorithm and heuristic search principle, a feasible path planning algorithm is proposed with the new concept of “inspired hop count” to balance the path weight and search time.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115596246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements in HRTF dataset of 3D game audio application","authors":"Ruixing Wu, Guang-zheng Yu","doi":"10.1109/ICALIP.2016.7846601","DOIUrl":"https://doi.org/10.1109/ICALIP.2016.7846601","url":null,"abstract":"OpenAL-Soft is a 3D audio application programming interface (API), which can be used to render the 3D virtual soundscape through sound signal processing based on the head-related transfer functions (HRTFs). However, the performance is still unsatisfied because of the oversimplified HRTF dataset that is left-right symmetrical and incomplete in spatial sampling. In order to improve the 3D sound reproduction performance of game audio, in current work, the left-right asymmetric HRTFs with full space sampling are adopted. Simulated results show that the more accurate magnitude spectra can be rendered by the improved HRTF dataset. The enlarged HRTF dataset could increase the memory footprint, but computation expense varies little.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131675165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}