基于深度神经网络的说话人识别模型

Iraqi Journal for Computer Science and Mathematics Pub Date : 2022-01-30 DOI:10.52866/ijcsm.2022.01.01.012

Saadaldeen Rashid Ahmed, Zainab Ali Abbood, hameed Mutlag Farhan, Baraa Taha Yasen, Mohammed Rashid Ahmed, Adil Deniz Duru

{"title":"基于深度神经网络的说话人识别模型","authors":"Saadaldeen Rashid Ahmed, Zainab Ali Abbood, hameed Mutlag Farhan, Baraa Taha Yasen, Mohammed Rashid Ahmed, Adil Deniz Duru","doi":"10.52866/ijcsm.2022.01.01.012","DOIUrl":null,"url":null,"abstract":"This study aims is to establish a small system of text-independent recognition of speakers for a\nrelatively small group of speakers at a sound stage. The fascinating justification for the International Space Station\n(ISS) to detect if the astronauts are speaking at a specific time has influenced the difficulty. In this work, we employed\nMachine Learning Applications. Accordingly, we used the Direct Deep Neural Network (DNN)-based approach, in\nwhich the posterior opportunities of the output layer are utilized to determine the speaker’s presence. In line with\nthe small footprint design objective, a simple DNN model with only sufficient hidden units or sufficient hidden\nunits per layer was designed, thereby reducing the cost of parameters through intentional preparation to avoid the\nnormal overfitting problem and optimize the algorithmic aspects, such as context-based training, activation functions,\nvalidation, and learning rate. Two commercially available databases, namely, TIMIT clean speech and HTIMIT multihandset communication database and TIMIT noise-added data framework, were tested for this reference model that\nwe developed using four sound categories at three distinct signal-to-noise ratios. Briefly, we used a dynamic pruning\nmethod in which the conditions of all layers are simultaneously pruned, and the pruning mechanism is reassigned.\nThe usefulness of this approach was evaluated on all the above contact databases","PeriodicalId":158721,"journal":{"name":"Iraqi Journal for Computer Science and Mathematics","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"SPEAKER IDENTIFICATION MODEL BASED ON DEEP\\nNURAL NETWOKS\",\"authors\":\"Saadaldeen Rashid Ahmed, Zainab Ali Abbood, hameed Mutlag Farhan, Baraa Taha Yasen, Mohammed Rashid Ahmed, Adil Deniz Duru\",\"doi\":\"10.52866/ijcsm.2022.01.01.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims is to establish a small system of text-independent recognition of speakers for a\\nrelatively small group of speakers at a sound stage. The fascinating justification for the International Space Station\\n(ISS) to detect if the astronauts are speaking at a specific time has influenced the difficulty. In this work, we employed\\nMachine Learning Applications. Accordingly, we used the Direct Deep Neural Network (DNN)-based approach, in\\nwhich the posterior opportunities of the output layer are utilized to determine the speaker’s presence. In line with\\nthe small footprint design objective, a simple DNN model with only sufficient hidden units or sufficient hidden\\nunits per layer was designed, thereby reducing the cost of parameters through intentional preparation to avoid the\\nnormal overfitting problem and optimize the algorithmic aspects, such as context-based training, activation functions,\\nvalidation, and learning rate. Two commercially available databases, namely, TIMIT clean speech and HTIMIT multihandset communication database and TIMIT noise-added data framework, were tested for this reference model that\\nwe developed using four sound categories at three distinct signal-to-noise ratios. Briefly, we used a dynamic pruning\\nmethod in which the conditions of all layers are simultaneously pruned, and the pruning mechanism is reassigned.\\nThe usefulness of this approach was evaluated on all the above contact databases\",\"PeriodicalId\":158721,\"journal\":{\"name\":\"Iraqi Journal for Computer Science and Mathematics\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Iraqi Journal for Computer Science and Mathematics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52866/ijcsm.2022.01.01.012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iraqi Journal for Computer Science and Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52866/ijcsm.2022.01.01.012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

本研究的目的是建立一个小型的独立于文本的说话人识别系统，用于相对较小的声音舞台上的说话人群体。国际空间站(ISS)检测宇航员是否在特定时间说话的有趣理由影响了难度。在这项工作中，我们使用了机器学习应用程序。因此，我们使用了基于直接深度神经网络(DNN)的方法，其中利用输出层的后验机会来确定说话者的存在。根据小占用空间的设计目标，设计了一个简单的DNN模型，每层只有足够的隐藏单元或足够的隐藏单元，从而通过有意的准备来减少参数的成本，以避免正常的过拟合问题，并优化算法方面，如基于上下文的训练、激活函数、验证和学习率。我们使用三种不同信噪比的四种声音类别，对该参考模型进行了测试，测试了两个商用数据库，即TIMIT干净语音和HTIMIT多手机通信数据库以及TIMIT加噪数据框架。简单地说，我们使用了一种动态剪枝方法，即同时剪枝所有层的条件，并重新分配剪枝机制。在所有上述联系数据库上评价了这一方法的有用性

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SPEAKER IDENTIFICATION MODEL BASED ON DEEP NURAL NETWOKS

This study aims is to establish a small system of text-independent recognition of speakers for a relatively small group of speakers at a sound stage. The fascinating justification for the International Space Station (ISS) to detect if the astronauts are speaking at a specific time has influenced the difficulty. In this work, we employed Machine Learning Applications. Accordingly, we used the Direct Deep Neural Network (DNN)-based approach, in which the posterior opportunities of the output layer are utilized to determine the speaker’s presence. In line with the small footprint design objective, a simple DNN model with only sufficient hidden units or sufficient hidden units per layer was designed, thereby reducing the cost of parameters through intentional preparation to avoid the normal overfitting problem and optimize the algorithmic aspects, such as context-based training, activation functions, validation, and learning rate. Two commercially available databases, namely, TIMIT clean speech and HTIMIT multihandset communication database and TIMIT noise-added data framework, were tested for this reference model that we developed using four sound categories at three distinct signal-to-noise ratios. Briefly, we used a dynamic pruning method in which the conditions of all layers are simultaneously pruned, and the pruning mechanism is reassigned. The usefulness of this approach was evaluated on all the above contact databases

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Iraqi Journal for Computer Science and Mathematics

CiteScore

4.30

自引率

0.00%

发文量