Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. Prasanna
{"title":"资源不足语言奥语方言识别的RMFCC特征分析","authors":"Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. Prasanna","doi":"10.1109/NCC55593.2022.9806770","DOIUrl":null,"url":null,"abstract":"Ao is a language spoken in Nagaland in the North-East of India. It is a low-resource tone language under the Tibeto-Burman language family. It consists of three tones, namely, high, mid and low. It has three distinct dialects of the language viz. Chungli, Mongsen and Changki. This paper presents an automatic dialect identification in Ao using the excitation source feature. The objective of a dialect identification system is to identify a speech variety within a language. The goal of this study is to determine if the excitation source features such as Residual Mel Frequency Cepstral Coefficient (RMFCC) can be exploited to discriminate the three dialects in Ao automatically. In addition, vocal tract system features, namely Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Cepstral (SDC) coefficients, are used as the baseline methods. The RMFCC features are obtained from the Linear Prediction (LP) residual signal, while MFCC features are derived from the smooth spectrum of the speech signal. SDC coefficients are explored to provide additional temporal information. This work is evaluated on trisyllabic words uttered by 36 speakers for the three dialects of Ao. A Gaussian Mixture Model (GMM) based classifier is used for classification. The performance of the system yields a better dialect identification accuracy rate when all three features are combined.","PeriodicalId":403870,"journal":{"name":"2022 National Conference on Communications (NCC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language\",\"authors\":\"Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. Prasanna\",\"doi\":\"10.1109/NCC55593.2022.9806770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ao is a language spoken in Nagaland in the North-East of India. It is a low-resource tone language under the Tibeto-Burman language family. It consists of three tones, namely, high, mid and low. It has three distinct dialects of the language viz. Chungli, Mongsen and Changki. This paper presents an automatic dialect identification in Ao using the excitation source feature. The objective of a dialect identification system is to identify a speech variety within a language. The goal of this study is to determine if the excitation source features such as Residual Mel Frequency Cepstral Coefficient (RMFCC) can be exploited to discriminate the three dialects in Ao automatically. In addition, vocal tract system features, namely Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Cepstral (SDC) coefficients, are used as the baseline methods. The RMFCC features are obtained from the Linear Prediction (LP) residual signal, while MFCC features are derived from the smooth spectrum of the speech signal. SDC coefficients are explored to provide additional temporal information. This work is evaluated on trisyllabic words uttered by 36 speakers for the three dialects of Ao. A Gaussian Mixture Model (GMM) based classifier is used for classification. The performance of the system yields a better dialect identification accuracy rate when all three features are combined.\",\"PeriodicalId\":403870,\"journal\":{\"name\":\"2022 National Conference on Communications (NCC)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC55593.2022.9806770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC55593.2022.9806770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analyzing RMFCC Feature for Dialect Identification in Ao, an Under-Resourced Language
Ao is a language spoken in Nagaland in the North-East of India. It is a low-resource tone language under the Tibeto-Burman language family. It consists of three tones, namely, high, mid and low. It has three distinct dialects of the language viz. Chungli, Mongsen and Changki. This paper presents an automatic dialect identification in Ao using the excitation source feature. The objective of a dialect identification system is to identify a speech variety within a language. The goal of this study is to determine if the excitation source features such as Residual Mel Frequency Cepstral Coefficient (RMFCC) can be exploited to discriminate the three dialects in Ao automatically. In addition, vocal tract system features, namely Mel Frequency Cepstral Coefficients (MFCC) and Shifted Delta Cepstral (SDC) coefficients, are used as the baseline methods. The RMFCC features are obtained from the Linear Prediction (LP) residual signal, while MFCC features are derived from the smooth spectrum of the speech signal. SDC coefficients are explored to provide additional temporal information. This work is evaluated on trisyllabic words uttered by 36 speakers for the three dialects of Ao. A Gaussian Mixture Model (GMM) based classifier is used for classification. The performance of the system yields a better dialect identification accuracy rate when all three features are combined.