{"title":"基于HMM的中间匹配核支持向量机语音序列模式分类","authors":"A. D. Dileep, C. Sekhar","doi":"10.1109/TASL.2013.2279338","DOIUrl":null,"url":null,"abstract":"In this paper, we address the issues in the design of an intermediate matching kernel (IMK) for classification of sequential patterns using support vector machine (SVM) based classifier for tasks such as speech recognition. Specifically, we address the issues in constructing a kernel for matching sequences of feature vectors extracted from the speech signal data of utterances. The codebook based IMK and Gaussian mixture model (GMM) based IMK have been proposed earlier for matching the varying length patterns represented as sets of features vectors for tasks such as image classification and speaker recognition. These methods consider the centers of clusters and the components of GMM as the virtual feature vectors used in the design of IMK. As these methods do not use sequence information in matching the patterns, these methods are not suitable for matching sequential patterns. We propose the hidden Markov model (HMM) based IMK for matching sequential patterns of varying length. We consider two approaches to design the HMM-based IMK. In the first approach, each of the two sequences to be matched is segmented into subsequences with each subsequence aligned to a state of the HMM. Then the HMM-based IMK is constructed as a combination of state-specific GMM-based IMKs that match the subsequences aligned with the particular states of the HMM. In the second approach, the HMM-based IMK is constructed without segmenting sequences, and by matching the local feature vectors selected using the responsibility terms that account for being in a state and generating the feature vectors by a component of the GMM of that state. We study the performance of the SVM based classifiers using the proposed HMM-based IMK for recognition of isolated utterances of E-set in English alphabet and recognition of consonent–vowel segments in Hindi language.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2279338","citationCount":"15","resultStr":"{\"title\":\"HMM Based Intermediate Matching Kernel for Classification of Sequential Patterns of Speech Using Support Vector Machines\",\"authors\":\"A. D. Dileep, C. Sekhar\",\"doi\":\"10.1109/TASL.2013.2279338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we address the issues in the design of an intermediate matching kernel (IMK) for classification of sequential patterns using support vector machine (SVM) based classifier for tasks such as speech recognition. Specifically, we address the issues in constructing a kernel for matching sequences of feature vectors extracted from the speech signal data of utterances. The codebook based IMK and Gaussian mixture model (GMM) based IMK have been proposed earlier for matching the varying length patterns represented as sets of features vectors for tasks such as image classification and speaker recognition. These methods consider the centers of clusters and the components of GMM as the virtual feature vectors used in the design of IMK. As these methods do not use sequence information in matching the patterns, these methods are not suitable for matching sequential patterns. We propose the hidden Markov model (HMM) based IMK for matching sequential patterns of varying length. We consider two approaches to design the HMM-based IMK. In the first approach, each of the two sequences to be matched is segmented into subsequences with each subsequence aligned to a state of the HMM. Then the HMM-based IMK is constructed as a combination of state-specific GMM-based IMKs that match the subsequences aligned with the particular states of the HMM. In the second approach, the HMM-based IMK is constructed without segmenting sequences, and by matching the local feature vectors selected using the responsibility terms that account for being in a state and generating the feature vectors by a component of the GMM of that state. We study the performance of the SVM based classifiers using the proposed HMM-based IMK for recognition of isolated utterances of E-set in English alphabet and recognition of consonent–vowel segments in Hindi language.\",\"PeriodicalId\":55014,\"journal\":{\"name\":\"IEEE Transactions on Audio Speech and Language Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TASL.2013.2279338\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Audio Speech and Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TASL.2013.2279338\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2279338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
HMM Based Intermediate Matching Kernel for Classification of Sequential Patterns of Speech Using Support Vector Machines
In this paper, we address the issues in the design of an intermediate matching kernel (IMK) for classification of sequential patterns using support vector machine (SVM) based classifier for tasks such as speech recognition. Specifically, we address the issues in constructing a kernel for matching sequences of feature vectors extracted from the speech signal data of utterances. The codebook based IMK and Gaussian mixture model (GMM) based IMK have been proposed earlier for matching the varying length patterns represented as sets of features vectors for tasks such as image classification and speaker recognition. These methods consider the centers of clusters and the components of GMM as the virtual feature vectors used in the design of IMK. As these methods do not use sequence information in matching the patterns, these methods are not suitable for matching sequential patterns. We propose the hidden Markov model (HMM) based IMK for matching sequential patterns of varying length. We consider two approaches to design the HMM-based IMK. In the first approach, each of the two sequences to be matched is segmented into subsequences with each subsequence aligned to a state of the HMM. Then the HMM-based IMK is constructed as a combination of state-specific GMM-based IMKs that match the subsequences aligned with the particular states of the HMM. In the second approach, the HMM-based IMK is constructed without segmenting sequences, and by matching the local feature vectors selected using the responsibility terms that account for being in a state and generating the feature vectors by a component of the GMM of that state. We study the performance of the SVM based classifiers using the proposed HMM-based IMK for recognition of isolated utterances of E-set in English alphabet and recognition of consonent–vowel segments in Hindi language.
期刊介绍:
The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.