On Finding the Best Learning Model for Assessing Confidence in Speech

Shruti S. Nair, Madhumita Mohan, Jemima Rajesh, P. Chandran
{"title":"On Finding the Best Learning Model for Assessing Confidence in Speech","authors":"Shruti S. Nair, Madhumita Mohan, Jemima Rajesh, P. Chandran","doi":"10.1145/3426826.3426838","DOIUrl":null,"url":null,"abstract":"The human mind is naturally conditioned to assess the confidence of another speaker. Hence, confidence while speaking is crucial for success across most domains and situations. Confidence in speech is a highly useful trait to have when engaged in interactions and discussions. In the right amounts, it can often sound pleasant or reassuring to the listener. For a person striving to achieve a note of confidence in his/her voice, finding a human evaluator to give relevant feedback on the tone and voice is not always possible. Given the growing power of neural networks and other machine learning tools today, a machine could potentially serve as an evaluator for assessing the confidence in the user's speech, and provide scores as feedback for the user's improvement. In this paper, we present the descriptions, results and analysis of our experiments in predicting the confidence of a speaker using machine learning and audio processing tools. The project involved the building and scoring of an unbiased dataset of audio recordings based on the confidence of the speaker. The audio clips were recorded by the peers in the campus and graded based on clarity, modulation, pace, and volume. Three models were trained and tested on the built dataset: a multilayer perceptron (MLP) neural network, a support vector machine (SVM) and a convolutional neural network (CNN) to predict the confidence of a speaker. Our results show that convolutional neural networks produce scores with the highest accuracy, 86.3%, where accuracy is measured with respect to the closeness to the scores awarded by human assessment.","PeriodicalId":202857,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3426826.3426838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The human mind is naturally conditioned to assess the confidence of another speaker. Hence, confidence while speaking is crucial for success across most domains and situations. Confidence in speech is a highly useful trait to have when engaged in interactions and discussions. In the right amounts, it can often sound pleasant or reassuring to the listener. For a person striving to achieve a note of confidence in his/her voice, finding a human evaluator to give relevant feedback on the tone and voice is not always possible. Given the growing power of neural networks and other machine learning tools today, a machine could potentially serve as an evaluator for assessing the confidence in the user's speech, and provide scores as feedback for the user's improvement. In this paper, we present the descriptions, results and analysis of our experiments in predicting the confidence of a speaker using machine learning and audio processing tools. The project involved the building and scoring of an unbiased dataset of audio recordings based on the confidence of the speaker. The audio clips were recorded by the peers in the campus and graded based on clarity, modulation, pace, and volume. Three models were trained and tested on the built dataset: a multilayer perceptron (MLP) neural network, a support vector machine (SVM) and a convolutional neural network (CNN) to predict the confidence of a speaker. Our results show that convolutional neural networks produce scores with the highest accuracy, 86.3%, where accuracy is measured with respect to the closeness to the scores awarded by human assessment.
寻找评估语音自信的最佳学习模型
人类的大脑天生习惯于评估另一个说话者的信心。因此,在大多数领域和情况下,说话时的自信对成功至关重要。在参与互动和讨论时,说话时的自信是一种非常有用的特质。在适当的量,它往往可以听起来令人愉快或安心的听者。对于一个努力在他/她的声音中获得自信的人来说,找到一个人类评估者对语气和声音给出相关的反馈并不总是可能的。鉴于当今神经网络和其他机器学习工具的日益强大,机器可能会作为评估者来评估用户对语音的信心,并为用户的改进提供分数作为反馈。在本文中,我们介绍了使用机器学习和音频处理工具预测说话者置信度的实验的描述、结果和分析。该项目涉及基于说话者的置信度建立和评分一个无偏音频数据集。这些音频片段由校园里的同龄人录制,并根据清晰度、调制、节奏和音量进行评分。在构建的数据集上训练和测试了三个模型:多层感知器(MLP)神经网络、支持向量机(SVM)和卷积神经网络(CNN)来预测说话者的置信度。我们的研究结果表明,卷积神经网络产生的分数准确率最高,为86.3%,其中准确率是根据与人类评估的分数的接近程度来衡量的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信