Ghassem Tofighi, A. Venetsanopoulos, K. Raahemifar, S. Beheshti, Helia Mohammadi
{"title":"Hand posture recognition using K-NN and Support Vector Machine classifiers evaluated on our proposed HandReader dataset","authors":"Ghassem Tofighi, A. Venetsanopoulos, K. Raahemifar, S. Beheshti, Helia Mohammadi","doi":"10.1109/ICDSP.2013.6622679","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622679","url":null,"abstract":"In this paper, we propose a real-time vision-based hand posture recognition approach, based on appearance-based features of the hand poses. Our approach has three main steps: Preprocessing, Feature Extraction and Posture Recognition. Additionally, a new hand posture dataset called HandReader is created and introduced. HandReader is a dataset of 500 images of 10 different hand postures which are 10 non-motion-based American Sign Language alphabets with dark backgrounds. The dataset is gathered by capturing images of 50 male and female individuals performing these 10 hand postures in front of a common camera. 20% of the HandReader images are used for the training purpose and the remaining 80% are used to test the proposed methodology. All the images are normalized after applying the preprocessing step. The normalized images are then converted to feature vectors in the Feature Extraction step. In order to train the system, k-NN classifier and SVM classifiers with linear and RBF kernel have been employed and results were compared. These approaches were used to classify hand posture images into 10 different posture classes. The SVM classifier with linear kernel performed better with the highest true detection rate (96%) among other proposed techniques.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130220535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. G. Vilda, Victor Nieto Lluis, M. V. R. Biarge, Agustín Álvarez Marquina, Luis Miguel Mazaira-Fernández, R. Martínez, Cristina Muñoz-Mulas, Mario Fernández-Fernández, Carlos Ramírez-Calvo
{"title":"Estimating Tremor in Vocal Fold Biomechanics for Neurological Disease Characterization","authors":"P. G. Vilda, Victor Nieto Lluis, M. V. R. Biarge, Agustín Álvarez Marquina, Luis Miguel Mazaira-Fernández, R. Martínez, Cristina Muñoz-Mulas, Mario Fernández-Fernández, Carlos Ramírez-Calvo","doi":"10.1109/ICDSP.2013.6622735","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622735","url":null,"abstract":"Neurological Diseases (ND) are affecting larger segments of aging population every year. Treatment is dependent on expensive accurate and frequent monitoring. It is well known that ND leave correlates in speech and phonation. The present work shows a method to detect alterations in vocal fold tension during phonation. These may appear either as hypertension or as cyclical tremor. Estimations of tremor may be produced by auto-regressive modeling of the vocal fold tension series in sustained phonation. The correlates obtained are a set of cyclicality coefficients, the frequency and the root mean square amplitude of the tremor. Statistical distributions of these correlates obtained from a set of male and female subjects are presented. Results from five study cases of female voice are also given.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133306392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tex-Lex: Automated generation of texture lexicons using images from the world wide web","authors":"Demetrios Gerogiannis, Christophoros Nikou","doi":"10.1109/ICDSP.2013.6622814","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622814","url":null,"abstract":"A method for automatic creation of a semantic texture database is introduced, which exploits the cumulative knowledge that exists in the image tags on the World Wide Web. In the first step of the method, a number of images are retrieved from the Web using the text search option provided by search engines by querying simple notions (e.g. sky, grass water, etc.). These images are segmented into a number of predefined regions using standard clustering and each region is described by a set of image features. The descriptors of the extracted regions of the whole set of images are compared based on the Bhattacharyya distance and the ones that are more similar are considered to be entries of a dictionary associated with the initial keyword used for the query. Moreover, the corresponding regions are parts of the visual lexicon describing the keyword. Also, an already existing lexicon may be iteratively updated by new features that may not match the existing dictionary entries but they are represented over a significant number of query results. Early results on common keywords representing landscapes indicate that the method is promising and may be extended to describe composite structures and objects.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133350921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single-image super-resolution using low complexity adaptive iterative back-projection","authors":"G. Georgis, G. Lentaris, D. Reisis","doi":"10.1109/ICDSP.2013.6622833","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622833","url":null,"abstract":"The current paper focuses on single-image super-resolution algorithms aiming at increasing the spatial resolution of images and video sequences. We achieve this goal by decreasing the complexity of the reconstruction process, combining common filtering methods and introducing adaptive error back-projection throughout an iterative back-projection framework. We compare our scheme with common interpolation algorithms and other single-image super-resolution techniques. The results of this work can be used to improve the performance of space and resource-limited implementations.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113966178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitris Kastaniotis, Ilias Theodorakopoulos, G. Economou, S. Fotopoulos
{"title":"Gait-based gender recognition using pose information for real time applications","authors":"Dimitris Kastaniotis, Ilias Theodorakopoulos, G. Economou, S. Fotopoulos","doi":"10.1109/ICDSP.2013.6622766","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622766","url":null,"abstract":"Biological cues inherent in human motion play an important role in the context of social communication. While recognizing the gender of other people is important for humans, security, advertisement and population statistics systems could also benefit from such kind of information. In this work for first time we propose a method suitable for real time gait based gender recognition relying on poses estimated from depth images. We provide evidence that pose based representation estimated by depth images could greatly benefit the problem of gait analysis. Given a gait sequence, in every frame the dynamics of gait motion are encoded using an angular representation. In particular several skeletal primitives are expressed as two Euler angles that cast votes into aggregated histograms. These histograms are then normalized, concatenated and projected onto a PCA basis in order to form the final sequence descriptor. We evaluated our method on a newly created dataset -UPCVgait - captured with Microsoft Kinect, consisting of 5 gait sequences performed by 30 subjects. An RBF kernel SVM used for classification in a leave one person out scheme on gait sequences of arbitrary length as well as on variable number of frames confirms the efficiency of our method.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"599 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116451796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural network target classification for Concealed Weapon radar detection","authors":"A. Vasalos, N. Uzunoglu, H. Ryu, I. Vasalos","doi":"10.1109/ICDSP.2013.6622819","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622819","url":null,"abstract":"The concept of Concealed Weapon and Explosive (CWE) detection by the analysis of the Late Time Response (LTR) of the complex human-CWE object in UWB Radar, has been presented in [1,2]. As the overall reflected human signal depends on the human stance and orientation with respect to the radar system, this paper investigates whether the resonant frequencies can be classified according to the illuminated simple i.e. human or complex i.e. human-CWE object. This classification yields that the human frequencies do not overlap with the CWE signature frequencies therefore the CWE frequencies can be obtained and the body-worn CWE detection is realised. The resonant frequency classification is achieved via a Learning Vector Quantization (LVQ) network.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114716254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Rodomagoulakis, Panagiotis Giannoulis, Z.-I. Skordilis, P. Maragos, G. Potamianos
{"title":"Experiments on far-field multichannel speech processing in smart homes","authors":"I. Rodomagoulakis, Panagiotis Giannoulis, Z.-I. Skordilis, P. Maragos, G. Potamianos","doi":"10.1109/ICDSP.2013.6622707","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622707","url":null,"abstract":"In this paper, we examine three problems that rise in the modern, challenging area of far-field speech processing. The developed methods for each problem, namely (a) multi-channel speech enhancement, (b) voice activity detection, and (c) speech recognition, are potentially applicable to a distant speech recognition system for voice-enabled smart home environments. The obtained results on real and simulated data, regarding the smart home speech applications, are quite promising due to the accomplished improvements made in the employed signal processing methods.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128666977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active contour model driven by Globally Signed Region Pressure Force","authors":"M. Abdelsamea, S. Tsaftaris","doi":"10.1109/ICDSP.2013.6622691","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622691","url":null,"abstract":"One of the most popular and widely used global active contour models (ACM) is the region-based ACM, which relies on the assumption of homogeneous intensity in the regions of interest. As a result, most often than not, when images violate this assumption the performance of this method is limited. Thus, handling images that contain foreground objects characterized by multiple intensity classes present a challenge. In this paper, we propose a novel active contour model based on a new Signed Pressure Force (SPF) function which we term Globally Signed Region Pressure Force (GSRPF). It is designed to incorporate, in a global fashion, the skewness of the intensity distribution of the region of interest (ROI). It can accurately modulate the signs of the pressure force inside and outside the contour, it can handle images with multiple intensity classes in the foreground, it is robust to additive noise, and offers high efficiency and rapid convergence. The proposed GSRPF is robust to contour initialization and has the ability to stop the curve evolution close to even ill-defined (weak) edges. Our model provides a parameter-free environment to allow minimum user intervention, and offers both local and global segmentation properties. Experimental results on several synthetic and real images demonstrate the high accuracy of the segmentation results in comparison to other methods adopted from the literature.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122052774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial sound rendering for dynamic virtual environments","authors":"B. Cowan, B. Kapralos","doi":"10.1109/ICDSP.2013.6622815","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622815","url":null,"abstract":"We present the details of a virtual sound rendering engine (VSRE) that is being developed for virtual environments and serious games. The VSRE incorporates innovative graphics processing unit-based methods to allow for the approximation of acoustical occlusion/diffraction and reverberation effects at interactive rates. In addition, the VSRE includes a GPU-based method that performs the one-dimensional convolution allowing for the incorporation of head-related transfer functions also at interactive rates. The VSRE is being developed as a research tool for examining multi-modal (audio-visual) interactions through the simple manipulation of the acoustic environment and audio parameters (sound quality), that will, through a series of human-based experiments, allow for the testing of the effect of varying these parameters may have on immersion, engagement, and visual fidelity perception within a virtual environment. Finally, we also provide a running time comparison of several one-dimensional convolution implementations.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125741329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance","authors":"Muhammad Salman Khan, S. M. Naqvi, J. Chambers","doi":"10.1109/ICDSP.2013.6622780","DOIUrl":"https://doi.org/10.1109/ICDSP.2013.6622780","url":null,"abstract":"This work presents a two-stage speech source separation algorithm based on combined models of interaural cues and spatial covariance which utilize knowledge of the locations of the sources estimated through video. In the first pre-processing stage the late reverberant speech components are suppressed by a spectral subtraction rule to dereverberate the observed mixture. In the second stage, the binaural spatial parameters, the interaural phase difference and the interaural level difference, and the spatial covariance are modeled in the short-time Fourier transform (STFT) domain to classify individual time-frequency (TF) units to each source. The parameters of these probabilistic models and the TF regions assigned to each source are updated with the expectation-maximization (EM) algorithm. The algorithm generates TF masks that are used to reconstruct the individual speech sources. Objective results, in terms of the signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), confirm that the proposed multimodal method with pre-processing is a promising approach for source separation in highly reverberant rooms.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"11 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123682649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}