MARS: A Hybrid Deep CNN-based Multi-Accent Recognition System for English Language

S. Darshana, H. Theivaprakasham, G. Jyothish Lal, B. Premjith, V. Sowmya, Kp Soman
{"title":"MARS: A Hybrid Deep CNN-based Multi-Accent Recognition System for English Language","authors":"S. Darshana, H. Theivaprakasham, G. Jyothish Lal, B. Premjith, V. Sowmya, Kp Soman","doi":"10.1109/ICAITPR51569.2022.9844177","DOIUrl":null,"url":null,"abstract":"Classifying the speech of non-native English speakers is challenging due to various features that distinguish accents. Accents vary by sex, age, formality, social status, geographical area, mother tongue, quality of the voice, phoneme, and prosody. This paper proposes a novel, well-structured database of non-native Indian English speaker accents, referred to as IndicAccentDB. IndicAccentDB contains speech samples from 6 different states to address the unbalanced dataset (gender-bias) and speaker mismatch problems observed in the past. The proposed work also discusses the requirements for creating the IndicAccentDB database and pre-processing tasks performed on the dataset. Furthermore, we experimented with accent classification models, namely 1D-CNN, Support Vector Machines, Random forest, Decision tree, ResNet18, ResNet50, and xResNet18, using MFCC and Mel-Spectrogram features to build the robust Multi-Accent Recognition System (MARS). At last, we evaluated the performance of proposed models on the novel database and compared the results using evaluation metrics like precision, accuracy, F1-score, and recall. Based on our findings, xResNet18 was able to identify the accent classes with significant accuracy.","PeriodicalId":262409,"journal":{"name":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAITPR51569.2022.9844177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Classifying the speech of non-native English speakers is challenging due to various features that distinguish accents. Accents vary by sex, age, formality, social status, geographical area, mother tongue, quality of the voice, phoneme, and prosody. This paper proposes a novel, well-structured database of non-native Indian English speaker accents, referred to as IndicAccentDB. IndicAccentDB contains speech samples from 6 different states to address the unbalanced dataset (gender-bias) and speaker mismatch problems observed in the past. The proposed work also discusses the requirements for creating the IndicAccentDB database and pre-processing tasks performed on the dataset. Furthermore, we experimented with accent classification models, namely 1D-CNN, Support Vector Machines, Random forest, Decision tree, ResNet18, ResNet50, and xResNet18, using MFCC and Mel-Spectrogram features to build the robust Multi-Accent Recognition System (MARS). At last, we evaluated the performance of proposed models on the novel database and compared the results using evaluation metrics like precision, accuracy, F1-score, and recall. Based on our findings, xResNet18 was able to identify the accent classes with significant accuracy.
MARS:一个基于cnn的混合深度英语多口音识别系统
由于区分口音的各种特征,对非英语母语者的语音进行分类是具有挑战性的。口音因性别、年龄、正式程度、社会地位、地理区域、母语、音质、音素和韵律而异。本文提出了一个新颖的、结构良好的非印度英语母语者口音数据库,称为IndicAccentDB。IndicAccentDB包含来自6个不同州的语音样本,以解决过去观察到的数据不平衡(性别偏见)和说话人不匹配问题。建议的工作还讨论了创建IndicAccentDB数据库的需求以及在数据集上执行的预处理任务。在此基础上,我们对一维cnn、支持向量机、随机森林、决策树、ResNet18、ResNet50和xResNet18等口音分类模型进行了实验,利用MFCC和Mel-Spectrogram特征构建了鲁棒的多口音识别系统(MARS)。最后,我们在新数据库上评估了所提出模型的性能,并使用精密度、准确度、f1分数和召回率等评价指标对结果进行了比较。根据我们的发现,xResNet18能够非常准确地识别重音类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信