Shilpa Sharma, Anurag Sharma, R. Malhotra, Punam Rattan
{"title":"基于窗口和改进k均值聚类算法的语音活动检测","authors":"Shilpa Sharma, Anurag Sharma, R. Malhotra, Punam Rattan","doi":"10.1109/ICIEM51511.2021.9445371","DOIUrl":null,"url":null,"abstract":"Voice Activity Detection (VAD) is a method of detecting speech and non-speech in noisy environments. Vaious methods for this purpose have also been proposed. In general, the research has been divided into supervised and unsupervised speech recognition and produced various algorithms to depict the occurring of speech signal. Research aims to examine window overlapping and detection of speech and non-speech segments. A speech signal seems to be a slowly non stationary signal, and its characteristics are short time constant when examined over a short span of time (between 10 and 30 ms). As a result, frames windowing is used to enable us to use a speech signal and interpret its characteristics. However, a widespread study is required in the selection of techniques from predefined VAD and problems and opportunities to increase research in the emerging region. The advantage of the new unsupervised K-means approach over the supervised method is that it will not have to pre-train classifiers and pre-know any previous knowledge about audio streams.","PeriodicalId":264094,"journal":{"name":"2021 2nd International Conference on Intelligent Engineering and Management (ICIEM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Voice Activity Detection using windowing and updated K-Means Clustering Algorithm\",\"authors\":\"Shilpa Sharma, Anurag Sharma, R. Malhotra, Punam Rattan\",\"doi\":\"10.1109/ICIEM51511.2021.9445371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice Activity Detection (VAD) is a method of detecting speech and non-speech in noisy environments. Vaious methods for this purpose have also been proposed. In general, the research has been divided into supervised and unsupervised speech recognition and produced various algorithms to depict the occurring of speech signal. Research aims to examine window overlapping and detection of speech and non-speech segments. A speech signal seems to be a slowly non stationary signal, and its characteristics are short time constant when examined over a short span of time (between 10 and 30 ms). As a result, frames windowing is used to enable us to use a speech signal and interpret its characteristics. However, a widespread study is required in the selection of techniques from predefined VAD and problems and opportunities to increase research in the emerging region. The advantage of the new unsupervised K-means approach over the supervised method is that it will not have to pre-train classifiers and pre-know any previous knowledge about audio streams.\",\"PeriodicalId\":264094,\"journal\":{\"name\":\"2021 2nd International Conference on Intelligent Engineering and Management (ICIEM)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 2nd International Conference on Intelligent Engineering and Management (ICIEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIEM51511.2021.9445371\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on Intelligent Engineering and Management (ICIEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEM51511.2021.9445371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Voice Activity Detection using windowing and updated K-Means Clustering Algorithm
Voice Activity Detection (VAD) is a method of detecting speech and non-speech in noisy environments. Vaious methods for this purpose have also been proposed. In general, the research has been divided into supervised and unsupervised speech recognition and produced various algorithms to depict the occurring of speech signal. Research aims to examine window overlapping and detection of speech and non-speech segments. A speech signal seems to be a slowly non stationary signal, and its characteristics are short time constant when examined over a short span of time (between 10 and 30 ms). As a result, frames windowing is used to enable us to use a speech signal and interpret its characteristics. However, a widespread study is required in the selection of techniques from predefined VAD and problems and opportunities to increase research in the emerging region. The advantage of the new unsupervised K-means approach over the supervised method is that it will not have to pre-train classifiers and pre-know any previous knowledge about audio streams.