{"title":"利用频移数据增强改进印尼语多民族说话人识别","authors":"Kristiawan Nugroho, Isworo Nugroho, De Rosal Ignatius Moses Setiadi, Omar Farooq","doi":"10.11591/ijai.v12.i4.pp1901-1908","DOIUrl":null,"url":null,"abstract":"Speaker recognition to recognize multiethnic speakers is an interesting research topic. Various studies involving many ethnicities require the right approach to achieve optimal model performance. The deep learning approach has been used in speaker recognition research involving many classes to achieve high accuracy results with promising results. However, multi-class and imbalanced datasets are still obstacles encountered in various studies using the deep learning method which cause overfitting and decreased accuracy. Data augmentation is an approach model used in overcoming the problem of small amounts of data and multiclass problems. This approach can improve the quality of research data according to the method applied. This study proposes a data augmentation method using pitch shifting with a deep neural network called pitch shifting data augmentation deep neural network (PSDA-DNN) to identify multiethnic Indonesian speakers. The results of the research that has been done prove that the PSDA-DNN approach is the best method in multi-ethnic speaker recognition where the accuracy reaches 99.27% and the precision, recall, F1 score is 97.60%.","PeriodicalId":52221,"journal":{"name":"IAES International Journal of Artificial Intelligence","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Indonesian multietnics speaker recognition using pitch shifting data augmentation\",\"authors\":\"Kristiawan Nugroho, Isworo Nugroho, De Rosal Ignatius Moses Setiadi, Omar Farooq\",\"doi\":\"10.11591/ijai.v12.i4.pp1901-1908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker recognition to recognize multiethnic speakers is an interesting research topic. Various studies involving many ethnicities require the right approach to achieve optimal model performance. The deep learning approach has been used in speaker recognition research involving many classes to achieve high accuracy results with promising results. However, multi-class and imbalanced datasets are still obstacles encountered in various studies using the deep learning method which cause overfitting and decreased accuracy. Data augmentation is an approach model used in overcoming the problem of small amounts of data and multiclass problems. This approach can improve the quality of research data according to the method applied. This study proposes a data augmentation method using pitch shifting with a deep neural network called pitch shifting data augmentation deep neural network (PSDA-DNN) to identify multiethnic Indonesian speakers. The results of the research that has been done prove that the PSDA-DNN approach is the best method in multi-ethnic speaker recognition where the accuracy reaches 99.27% and the precision, recall, F1 score is 97.60%.\",\"PeriodicalId\":52221,\"journal\":{\"name\":\"IAES International Journal of Artificial Intelligence\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IAES International Journal of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11591/ijai.v12.i4.pp1901-1908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v12.i4.pp1901-1908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
Improving Indonesian multietnics speaker recognition using pitch shifting data augmentation
Speaker recognition to recognize multiethnic speakers is an interesting research topic. Various studies involving many ethnicities require the right approach to achieve optimal model performance. The deep learning approach has been used in speaker recognition research involving many classes to achieve high accuracy results with promising results. However, multi-class and imbalanced datasets are still obstacles encountered in various studies using the deep learning method which cause overfitting and decreased accuracy. Data augmentation is an approach model used in overcoming the problem of small amounts of data and multiclass problems. This approach can improve the quality of research data according to the method applied. This study proposes a data augmentation method using pitch shifting with a deep neural network called pitch shifting data augmentation deep neural network (PSDA-DNN) to identify multiethnic Indonesian speakers. The results of the research that has been done prove that the PSDA-DNN approach is the best method in multi-ethnic speaker recognition where the accuracy reaches 99.27% and the precision, recall, F1 score is 97.60%.