One Voice is All You Need: A One-Shot Approach to Recognize Your Voice

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) Pub Date : 2022-03-01 DOI:10.1109/CDMA54072.2022.00022

Priata Nowshin, Shahriar Rumi Dipto, Intesur Ahmed, Deboraj Chowdhury, Galib Abdun Noor, Amitabha Chakrabarty, Muhammad Tahmeed Abdullah, Moshiur Rahman

{"title":"One Voice is All You Need: A One-Shot Approach to Recognize Your Voice","authors":"Priata Nowshin, Shahriar Rumi Dipto, Intesur Ahmed, Deboraj Chowdhury, Galib Abdun Noor, Amitabha Chakrabarty, Muhammad Tahmeed Abdullah, Moshiur Rahman","doi":"10.1109/CDMA54072.2022.00022","DOIUrl":null,"url":null,"abstract":"In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDMA54072.2022.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network.

查看原文本刊更多论文

一个声音就是你所需要的:一次识别你的声音的方法

在计算机视觉领域，单次学习已经被证明是有效的，因为它可以准确地使用单个标记的训练样例和少量的训练集。在单次学习中，我们必须根据每个新类的一个样本准确地做出预测。在本文中，我们研究了一种学习暹罗神经网络的策略，该网络使用独特的结构来自动评估输入之间的相似性。本文的目标是通过提取特定的特征，将一次性学习的概念应用到音频分类中，其中使用三重损失来训练模型通过Siamese网络学习，并在通过支持集和查询集进行测试时计算相似率。我们已经在librisspeech ASR语料库上进行了实验。我们评估了我们在N-way-1-shot学习方面的工作，并在2-way(100%)、3-way(95%)、4-way(84%)和5-way(74%)方面产生了强有力的结果，大大优于现有的机器学习模型。据我们所知，这可能是第一篇研究使用Siamese网络在librisspeech ASR语料库上进行一次性人类语音识别的可能性的论文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

自引率

0.00%

发文量