{"title":"An integrated approach for efficient analysis of facial expressions","authors":"M. Ghayoumi, A. Bansal","doi":"10.5220/0005116702110219","DOIUrl":"https://doi.org/10.5220/0005116702110219","url":null,"abstract":"This paper describes a new automated facial expression analysis system that integrates Locality Sensitive Hashing (LSH) with Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to improve execution efficiency of emotion classification and continuous identification of unidentified facial expressions. Images are classified using feature-vectors on two most significant segments of face: eye segments and mouth-segment. LSH uses a family of hashing functions to map similar images in a set of collision-buckets. Taking a representative image from each cluster reduces the image space by pruning redundant similar images in the collision-buckets. The application of PCA and LDA reduces the dimension of the data-space. We describe the overall architecture and the implementation. The performance results show that the integration of LSH with PCA and LDA significantly improves computational efficiency, and improves the accuracy by reducing the frequency-bias of similar images during PCA and SVM stage. After the classification of image on database, we tag the collision-buckets with basic emotions, and apply LSH on new unidentified facial expressions to identify the emotions. This LSH based identification is suitable for fast continuous recognition of unidentified facial expressions.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126654433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Auditory features analysis for BIC-based audio segmentation","authors":"T. Maka","doi":"10.5220/0005063800480053","DOIUrl":"https://doi.org/10.5220/0005063800480053","url":null,"abstract":"Audio segmentation is one of the stages in audio processing chain whose accuracy plays a primary role in the final performance of the audio recognition and processing tasks. This paper presents an analysis of auditory features for audio segmentation. A set of features is derived from a time-frequency representation of an input signal and has been calculated based on properties of human auditory system. An analysis of several sets of audio features efficiency for BIC-based audio segmentation has been performed. The obtained results show that auditory features derived from different frequency scales are competitive to the widely used MFCC feature in terms of accuracy and the number of detected points.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115926967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D dual-tree discrete wavelet transform based multiple description video coding","authors":"J. Chen, Jie Liao, Yuhang Yang, C. Cai","doi":"10.3233/JCM-160704","DOIUrl":"https://doi.org/10.3233/JCM-160704","url":null,"abstract":"A 3D dual-tree discrete wavelet transform (DT-DWT) based multiple description video coding algorithm is proposed to combat the transmitting error or packet loss due to Internet or wireless network channel failure. Each description of the proposed multiple description coding scheme consists of a base layer and an enhancement layer. First, the input image sequence is encoded by a standard H.264 encoder in low bit rate to form the base layer, which is then duplicated to each description. Second, the difference between the reconstructed base layer and the input image sequence is encoded by a 3D dual-tree wavelet encoder to produce four coefficient trees. After noise-shaping, these four trees are partitioned into two groups, individually forming enhancement layers of two descriptions. Since the 3D DT-DWT equips 28 directional subbands, the enhancement layer can be coded without motion estimation. The plenty of directional selectivity of DT-DWT solves the mismatch problem and improves the coding efficiency. If all descriptions are available in the receiver, a high quality video can be reconstructed by a central decoder. If only one description is received, a side decoder can be used to reconstruct the source with acceptable quality. Simulation results have shown that the quality of reconstructed video by the proposed algorithm is superior to that by the state-of-the-art multiple description video coding methods.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121278969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic letter/pillarbox detection for optimized display of digital TV","authors":"L. Carreira, Tiago Rosa Maria Paula Queluz","doi":"10.5220/0005064202810288","DOIUrl":"https://doi.org/10.5220/0005064202810288","url":null,"abstract":"In this paper we propose a method for the automatic detection of the true aspect ratio of digital video, by detecting the presence and width of horizontal and vertical black bars, also known as letterbox and pillarbox effects. If active format description (AFD) metadata is not present, the proposed method can be used to identify the right AFD and associate it to the video content. In the case AFD information is present, the method can be used to verify its correctness and to correct it in case of error. Additionally, the proposed method also allows to detect if relevant information (as broadcaster logos and hard subtitles) is merged within the black bars and, in the case of subtitles, is able to extract it from the bars and dislocate it to the active picture area (allowing the letterbox removal).","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122670241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lip tracking using particle filter and geometric model for visual speech recognition","authors":"Islem Jarraya, S. Werda, W. Mahdi","doi":"10.5220/0005045601720179","DOIUrl":"https://doi.org/10.5220/0005045601720179","url":null,"abstract":"The automatic lip-reading is a technology which helps understanding messages exchanged in the case of a noisy environment or of elderly hearing impairment. To carry out this system, we need to implement three subsystems. There is a locating and tracking lips system, labial descriptors extraction system and a classification and speech recognition system. In this work, we present a spatio-temporal approach to track and characterize lip movements for the automatic recognition of visemes of the French language. First, we segment lips using the color information and a geometric model of lips. Then, we apply a particle filter to track lip movements. Finally, we propose to extract and classify the visual informations to recognize the pronounced viseme. This approach is applied with multiple speakers in natural conditions.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131076715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smoothed surface transitions for human motion synthesis","authors":"A. Doshi","doi":"10.5220/0005122400730079","DOIUrl":"https://doi.org/10.5220/0005122400730079","url":null,"abstract":"Multiview techniques to reconstruct an animation from 3D video have advanced in leaps and bounds in recent years. It is now possible to synthesise a 3D animation by fusing motions between different sequences. Prior work in this area has established methods to successfully identify inter-sequence transitions of different or similar actions. In some instances however, the transitions at these nodes in the motion path would cause an abrupt change between the motion sequences. Hence, this paper proposes a framework that allows for smoothing of these inter-sequence transitions, while preserving the detailed dynamics of the captured movement. Laplacian based mesh deformation, in addition to shape and appearance based feature methods, including SIFT and MeshHOG features, are used to obtain temporally consistent meshes. These meshes are then interpolated within a temporal window and concatenated to reproduce a seamless transition between the motion sequences. A quantitative analysis of the inter-sequence transitions, evaluated using three dimensional shape based Hausdorff distance is presented for synthesised 3D animations.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128964866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a wake-up and synchronization mechanism for Multiscreen applications using iBeacon","authors":"Louay Bassbouss, Görkem Güçlü, S. Steglich","doi":"10.5220/0005121800670072","DOIUrl":"https://doi.org/10.5220/0005121800670072","url":null,"abstract":"TV sets and companion devices (Smartphones, Tablets, etc.) have outgrown their original purpose and are now playing together an important role to offer the best user experience on multiple screens. However, the collaboration between TV and companion applications faces challenges that go beyond traditional single screen applications. These range from discovery, wake-up and pairing of devices, to application launch, communication, synchronization and adaptation to target device and screen size. In this position paper, we will limit ourselves to two of these aspects and introduce an idea for a new wake-up and synchronization mechanism for Multiscreen applications using iBeacon technology.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130894512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methods and algorithms of cluster analysis in the mining industry: Solution of tasks for mineral rocks recognition","authors":"O. Baklanova, O. Shvets","doi":"10.5220/0005022901650171","DOIUrl":"https://doi.org/10.5220/0005022901650171","url":null,"abstract":"It is described the algorithm for automatic segmentation of colour images of ores, using the methods of cluster analysis. There are some examples illustrated using of the algorithm in the solving of mineral rock recognition problems. Results of studies are demonstrated different colour spaces by k-means clustering. It was supposed the technique of pre-computing the values of the centroids. There is formulas translation metrics colour space HSV. The effectiveness of the proposed method lies in the automatic identification of interest objects on the total image, tuning parameters of the algorithm is a number that indicates the amount allocated to the segments. This paper contains short description of cluster analysis algorithm for the mineral rock recognition in the mining industry.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134211139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gender classification using M-estimator based radial basis function neural network","authors":"Chien-Cheng Lee","doi":"10.5220/0005117103020306","DOIUrl":"https://doi.org/10.5220/0005117103020306","url":null,"abstract":"A gender classification method using an M-estimator based radial basis function (RBF) neural network is proposed in this paper. In the proposed method, three types of effective features, including facial texture features, hair geometry features, and moustache features are extracted from a face image. Then, an improved RBF neural network based on M-estimator is proposed to classify the gender according to the extracted features. The improved RBF network uses an M-estimator to replace the traditional least-mean square (LMS) criterion to deal with the outliers in the data set. The FERET database is used to evaluate our method in the experiment. In the FERET data set, 600 images are chosen in which 300 of them are used as training data and the rest are regarded as test data. The experimental results show that the proposed method can produce a good performance.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131455972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HMM-based breath and filled pauses elimination in ASR","authors":"Piotr Żelasko, T. Jadczyk, B. Ziółko","doi":"10.5220/0005023002550260","DOIUrl":"https://doi.org/10.5220/0005023002550260","url":null,"abstract":"The phenomena of filled pauses and breaths pose a challenge to Automatic Speech Recognition (ASR) systems dealing with spontaneous speech, including recognizer modules in Interactive Voice Reponse (IVR) systems. We suggest a method based on Hidden Markov Models (HMM), which is easily integrated into HMM-based ASR systems and allows detection of those disturbances without incorporating additional parameters. Our method involves training the models of disturbances and their insertion in the phrase Markov chain between word-final and word-initial phoneme models. Application of the method in our ASR shows improvement of recognition results in Polish telephonic speech corpus LUNA.","PeriodicalId":438702,"journal":{"name":"2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133732087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}