M. Najafian, Sameer Khurana, Suwon Shon, Ahmed Ali, James R. Glass
{"title":"利用卷积神经网络进行语音法方言识别","authors":"M. Najafian, Sameer Khurana, Suwon Shon, Ahmed Ali, James R. Glass","doi":"10.1109/ICASSP.2018.8461486","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"5174-5178"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification\",\"authors\":\"M. Najafian, Sameer Khurana, Suwon Shon, Ahmed Ali, James R. Glass\",\"doi\":\"10.1109/ICASSP.2018.8461486\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.\",\"PeriodicalId\":6638,\"journal\":{\"name\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"7 1\",\"pages\":\"5174-5178\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2018.8461486\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2018.8461486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification
In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.