{"title":"Support Vector Machine based Voice Activity Detection","authors":"M. Baig, S. Masud, Mian M. Awais","doi":"10.1109/ISPACS.2006.364896","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364896","url":null,"abstract":"Voice activity detection (VAD) is important for efficient speech coding and accurate automatic speech recognition (ASR). Most of the algorithms proposed in the past, for solving the VAD problem, have been based on some deterministic feature of the speech signal such as zero crossing rate. The speech/non-speech decisions are then taken using suitably chosen thresholds. This paper presents the application of support vector machines (SVM) for classifying the voice activity. The speech signal has been divided into labeled overlapping frames and pattern classification has subsequently been performed by using a supervised learning algorithm. It has been observed that the SVM based solution is computationally efficient and provides around 90% accuracy for speech signals directly recorded using a microphone and an accuracy of over 85% for noisy speech","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124348067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sinusoidal Noise Reduction Method Using Leaky LMS Algorithm","authors":"Teppei Washi, A. Kawamura, Y. Iiguni","doi":"10.1109/ISPACS.2006.364892","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364892","url":null,"abstract":"A technique that uses a prediction error filter for reducing sinusoidal noises from a noisy speech has been proposed previously. Since the prediction error filter can estimate the sinusoidal noise completely, the output becomes zero in a non-speech segment. After the prediction error filter converges, the update of the filter coefficients is stopped. Then the fixed prediction error filter can cancel the sinusoidal noises except for a speech signal in a speech segment. However, frequency characteristics of the filter depend on its prediction algorithm, and the coefficients may converge the values which gives degradation of the speech. In this paper, we propose a new noise reduction algorithm which is a kind of leaky LMS algorithm, so that the prediction error filter removes only the sinusoidal line spectrum without speech degradation","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114593415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akihiro Hayasaka, Koichi Ito, T. Aoki, H. Nakajimat
{"title":"A 3D Face Recognition System Using Passive Stereo Vision and Its Performance Evaluation","authors":"Akihiro Hayasaka, Koichi Ito, T. Aoki, H. Nakajimat","doi":"10.1109/ISPACS.2006.364908","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364908","url":null,"abstract":"This paper proposes a three-dimensional (3D) face recognition system using passive stereo vision. So far, the reported 3D face recognition systems assume the use of active 3D measurement for 3D facial capture. However, active methods employ structured illumination or laser scanning, which is not desirable in many human recognition applications. Addressing this problem, we propose a 3D face recognition system that uses (i) AdaBoost-based face detection to automatically extract a face region from an image, (ii) passive stereo vision to capture 3D facial information, and (iii) 3D face matching based on a simple ICP (iterative closest point) algorithm. Experimental evaluation demonstrates an efficient recognition performance of the proposed system","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122120482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. N. Iyer, U. Ofoegbu, R. Yantorno, B. Y. Smolenski
{"title":"Generic Modeling Applied to Speaker Count","authors":"A. N. Iyer, U. Ofoegbu, R. Yantorno, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364898","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364898","url":null,"abstract":"The problem of determining the number of speakers participating in a conversation and building their models in short conversations, within an unknown group of speakers, is addressed in this paper. The lack of information about the number of speakers and the unavailability of sufficient data present a challenging task of efficiently estimating the speaker model parameters. The proposed method uses a novel generic speaker identification (GSID) system as a guide in the model building process. The GSID system is designed performing speaker identification where the speaker associated with the test data may not be enrolled. The models in the GSID system are employed as initial speaker models, representing the persons participating in the conversation, and are subjected to a classification-adaptation procedure. The classification is performed based on the Bhattacharyya distance between the model database and the test data being analyzed. The model database of the system is designed to consist of simple and well separated models. A technique to generate such generic models is introduced. The proposed method was applied to the speaker count problem and has produced an overall accuracy of 75.3% in determining if there were 1, 2 or 3 speakers in a conversation","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124078835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low Jitter and Wide Range VCO for CD/DVD/Blu-ray Disc","authors":"Takashi Kawamoto, Masaru Kokubo, Shingi Kusakabet, Takehisa Yokohamat","doi":"10.1109/ISPACS.2006.364877","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364877","url":null,"abstract":"A voltage-controlled oscillator (VCO) for optical-disc-drive (ODD) applications was developed. This VCO selects the most appropriate current-controlled oscillator (CCO) from three CCOs in order to satisfy various ODD read/write speeds. It produces the frequency characteristics of a maximum frequency limiter in order to prevent unlocking of a PLL. Moreover, the VCO applies a trimming method to correct degradation of frequency characteristics due to process variation. A post-layout simulation of the VCO performance, using the TSMC 90-nm CMOS process, was performed, and the simulation results show that the VCO satisfies each of the 52X-CD, 16X-DVD, and 12X-BD frequency specifications with jitter less than 2%","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129386128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Realistic Model of VOR Motor Learning with Spiking Cerebellar Cortical Neuronal Network","authors":"Keiichiro Inagaki, Yutaka Hirata","doi":"10.1109/ISPACS.2006.364790","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364790","url":null,"abstract":"The vestibuloocular reflex (VOR) stabilizes our vision during head movements. The VOR is under adaptive control which requires the cerebellar flocculus, especially its highly organized neuronal network with various synaptic plasticities. To elucidate the signal processing in cerebellar flocculus during VOR adaptation, we constructed a mathematical model in which the cerebellar cortical neuronal network is explicitly described by integrate-and-fire neurons based upon the known anatomy and physiology. Model simulations confirmed that the model reproduces characteristic Purkinje cell simple spike discharge patterns and eye movements during VOR, optokinetic response, and various visual-vestibular mismatch paradigms in normal and flocculectomized monkeys","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129487891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Ghoniem, A. Haggag, Yuzhen Li, Jianming Lu, T. Yahagi, Sekiya Lab
{"title":"Adaptive Motion Estimation Block Matching Algorithms for Video Coding","authors":"Mohamed Ghoniem, A. Haggag, Yuzhen Li, Jianming Lu, T. Yahagi, Sekiya Lab","doi":"10.1109/ISPACS.2006.364690","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364690","url":null,"abstract":"In most block-based video coding systems, the fast block matching algorithms (BMAs) use the origin as the initial search center, which may not track the motion very well. To improve the accuracy of the fast BMAs, a new adaptive motion tracking search algorithm is proposed in this paper. Based on the spatial correlation of motion blocks, a predicted starting search point, which reflects the motion trend of the current block, is adaptively chosen. It does not have the problem of being trapped by local minimum, and is characterized by finding the majority motion vector in one step. When compared with six other block-based search algorithms including the full-search and three-step-search, the new algorithm has an average PSNR very close to that of full search, yet an average search time faster than the three step-search","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128283656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Realization of Nonlinear Acoustic Echo Cancellation by Subband Parallel Cascade Volterra Filter","authors":"Hideyuki Furuhashi, Y. Kajikawa, Y. Nomura","doi":"10.1109/ISPACS.2006.364775","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364775","url":null,"abstract":"In this paper, we propose low complexity realization of nonlinear acoustic echo cancellation. Generally, it is assumed that the acoustic echo path in hands free telecommunication systems is a linear system. However, the acoustic echo path in modern cellular phones has nonlinearity because the influence of nonlinear distortions of the low cost audio equipment is very large. In order to solve this problem, the nonlinear echo cancellation that includes the linearization system has been proposed. However, there is a problem of having huge computational complexity for convolution between the input signal and the 2nd-order Volterra kernel. Therefore, we propose a nonlinear echo cancellation which consists of subband parallel cascade realization of the 2nd-order Volterra kernel, and examine the validity through simulation results. Simulation results demonstrate that the proposed realization can substantially reduce the computational complexity while maintaining the same echo return loss enhancement as the conventional one.","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128624337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Passband Estimation for Modulated Superlattices Based on Circuit Theory","authors":"K. Asakura, H. Sanada, O. Ogurisu, M. Suzuki","doi":"10.1109/ISPACS.2006.364917","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364917","url":null,"abstract":"This paper presents that Gaussian superlattices modulated layer thickness form a band structure, which consists of real-passbands, quasi-passbands and stopbands, same as those modulated potential height unlike the periodic superlattices. This paper shows a close relation between the band structure of Gaussian superlattices modulated layer thickness and that of periodic superlattices. According to the relation, the real-passbands of the former can be estimated with simple calculation from the latter effectively evaluated by applying image parameters in circuit theory","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127284187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akira Nikaido, Koichi Ito, T. Aoki, Eiko Kosuge, R. Kawamata
{"title":"A Dental Radiograph Registration Algorithm Using Phase-Based Image Matching for Human Identification","authors":"Akira Nikaido, Koichi Ito, T. Aoki, Eiko Kosuge, R. Kawamata","doi":"10.1109/ISPACS.2006.364907","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364907","url":null,"abstract":"Dental records are often used to identify victims of massive disasters, where the conventional biometric features, e.g., face, fingerprint, iris, etc., are not available. Human identification using dental records is to match an unidentified individual's postmortem radiographs against a set of identified antemortem radiographs. This paper presents an efficient dental radiograph registration algorithm using phase-based image matching for human identification. The use of phase components in 2D (two-dimensional) discrete Fourier transforms of dental radiograph images makes possible to achieve highly robust image registration and recognition. Experimental evaluation using a small database of dental radiographs indicates that the proposed algorithm exhibits efficient recognition performance for low-quality images","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127176562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}