MARS: A Hybrid Deep CNN-based Multi-Accent Recognition System for English Language

2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR) Pub Date : 2022-03-10 DOI:10.1109/ICAITPR51569.2022.9844177

S. Darshana, H. Theivaprakasham, G. Jyothish Lal, B. Premjith, V. Sowmya, Kp Soman

{"title":"MARS: A Hybrid Deep CNN-based Multi-Accent Recognition System for English Language","authors":"S. Darshana, H. Theivaprakasham, G. Jyothish Lal, B. Premjith, V. Sowmya, Kp Soman","doi":"10.1109/ICAITPR51569.2022.9844177","DOIUrl":null,"url":null,"abstract":"Classifying the speech of non-native English speakers is challenging due to various features that distinguish accents. Accents vary by sex, age, formality, social status, geographical area, mother tongue, quality of the voice, phoneme, and prosody. This paper proposes a novel, well-structured database of non-native Indian English speaker accents, referred to as IndicAccentDB. IndicAccentDB contains speech samples from 6 different states to address the unbalanced dataset (gender-bias) and speaker mismatch problems observed in the past. The proposed work also discusses the requirements for creating the IndicAccentDB database and pre-processing tasks performed on the dataset. Furthermore, we experimented with accent classification models, namely 1D-CNN, Support Vector Machines, Random forest, Decision tree, ResNet18, ResNet50, and xResNet18, using MFCC and Mel-Spectrogram features to build the robust Multi-Accent Recognition System (MARS). At last, we evaluated the performance of proposed models on the novel database and compared the results using evaluation metrics like precision, accuracy, F1-score, and recall. Based on our findings, xResNet18 was able to identify the accent classes with significant accuracy.","PeriodicalId":262409,"journal":{"name":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAITPR51569.2022.9844177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Classifying the speech of non-native English speakers is challenging due to various features that distinguish accents. Accents vary by sex, age, formality, social status, geographical area, mother tongue, quality of the voice, phoneme, and prosody. This paper proposes a novel, well-structured database of non-native Indian English speaker accents, referred to as IndicAccentDB. IndicAccentDB contains speech samples from 6 different states to address the unbalanced dataset (gender-bias) and speaker mismatch problems observed in the past. The proposed work also discusses the requirements for creating the IndicAccentDB database and pre-processing tasks performed on the dataset. Furthermore, we experimented with accent classification models, namely 1D-CNN, Support Vector Machines, Random forest, Decision tree, ResNet18, ResNet50, and xResNet18, using MFCC and Mel-Spectrogram features to build the robust Multi-Accent Recognition System (MARS). At last, we evaluated the performance of proposed models on the novel database and compared the results using evaluation metrics like precision, accuracy, F1-score, and recall. Based on our findings, xResNet18 was able to identify the accent classes with significant accuracy.

查看原文本刊更多论文

MARS:一个基于cnn的混合深度英语多口音识别系统

由于区分口音的各种特征，对非英语母语者的语音进行分类是具有挑战性的。口音因性别、年龄、正式程度、社会地位、地理区域、母语、音质、音素和韵律而异。本文提出了一个新颖的、结构良好的非印度英语母语者口音数据库，称为IndicAccentDB。IndicAccentDB包含来自6个不同州的语音样本，以解决过去观察到的数据不平衡(性别偏见)和说话人不匹配问题。建议的工作还讨论了创建IndicAccentDB数据库的需求以及在数据集上执行的预处理任务。在此基础上，我们对一维cnn、支持向量机、随机森林、决策树、ResNet18、ResNet50和xResNet18等口音分类模型进行了实验，利用MFCC和Mel-Spectrogram特征构建了鲁棒的多口音识别系统(MARS)。最后，我们在新数据库上评估了所提出模型的性能，并使用精密度、准确度、f1分数和召回率等评价指标对结果进行了比较。根据我们的发现，xResNet18能够非常准确地识别重音类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)

自引率

0.00%

发文量