Saadaldeen Rashid Ahmed, Zainab Ali Abbood, hameed Mutlag Farhan, Baraa Taha Yasen, Mohammed Rashid Ahmed, Adil Deniz Duru
{"title":"基于深度神经网络的说话人识别模型","authors":"Saadaldeen Rashid Ahmed, Zainab Ali Abbood, hameed Mutlag Farhan, Baraa Taha Yasen, Mohammed Rashid Ahmed, Adil Deniz Duru","doi":"10.52866/ijcsm.2022.01.01.012","DOIUrl":null,"url":null,"abstract":"This study aims is to establish a small system of text-independent recognition of speakers for a\nrelatively small group of speakers at a sound stage. The fascinating justification for the International Space Station\n(ISS) to detect if the astronauts are speaking at a specific time has influenced the difficulty. In this work, we employed\nMachine Learning Applications. Accordingly, we used the Direct Deep Neural Network (DNN)-based approach, in\nwhich the posterior opportunities of the output layer are utilized to determine the speaker’s presence. In line with\nthe small footprint design objective, a simple DNN model with only sufficient hidden units or sufficient hidden\nunits per layer was designed, thereby reducing the cost of parameters through intentional preparation to avoid the\nnormal overfitting problem and optimize the algorithmic aspects, such as context-based training, activation functions,\nvalidation, and learning rate. Two commercially available databases, namely, TIMIT clean speech and HTIMIT multihandset communication database and TIMIT noise-added data framework, were tested for this reference model that\nwe developed using four sound categories at three distinct signal-to-noise ratios. Briefly, we used a dynamic pruning\nmethod in which the conditions of all layers are simultaneously pruned, and the pruning mechanism is reassigned.\nThe usefulness of this approach was evaluated on all the above contact databases","PeriodicalId":158721,"journal":{"name":"Iraqi Journal for Computer Science and Mathematics","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"SPEAKER IDENTIFICATION MODEL BASED ON DEEP\\nNURAL NETWOKS\",\"authors\":\"Saadaldeen Rashid Ahmed, Zainab Ali Abbood, hameed Mutlag Farhan, Baraa Taha Yasen, Mohammed Rashid Ahmed, Adil Deniz Duru\",\"doi\":\"10.52866/ijcsm.2022.01.01.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims is to establish a small system of text-independent recognition of speakers for a\\nrelatively small group of speakers at a sound stage. The fascinating justification for the International Space Station\\n(ISS) to detect if the astronauts are speaking at a specific time has influenced the difficulty. In this work, we employed\\nMachine Learning Applications. Accordingly, we used the Direct Deep Neural Network (DNN)-based approach, in\\nwhich the posterior opportunities of the output layer are utilized to determine the speaker’s presence. In line with\\nthe small footprint design objective, a simple DNN model with only sufficient hidden units or sufficient hidden\\nunits per layer was designed, thereby reducing the cost of parameters through intentional preparation to avoid the\\nnormal overfitting problem and optimize the algorithmic aspects, such as context-based training, activation functions,\\nvalidation, and learning rate. Two commercially available databases, namely, TIMIT clean speech and HTIMIT multihandset communication database and TIMIT noise-added data framework, were tested for this reference model that\\nwe developed using four sound categories at three distinct signal-to-noise ratios. Briefly, we used a dynamic pruning\\nmethod in which the conditions of all layers are simultaneously pruned, and the pruning mechanism is reassigned.\\nThe usefulness of this approach was evaluated on all the above contact databases\",\"PeriodicalId\":158721,\"journal\":{\"name\":\"Iraqi Journal for Computer Science and Mathematics\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Iraqi Journal for Computer Science and Mathematics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52866/ijcsm.2022.01.01.012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iraqi Journal for Computer Science and Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52866/ijcsm.2022.01.01.012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SPEAKER IDENTIFICATION MODEL BASED ON DEEP
NURAL NETWOKS
This study aims is to establish a small system of text-independent recognition of speakers for a
relatively small group of speakers at a sound stage. The fascinating justification for the International Space Station
(ISS) to detect if the astronauts are speaking at a specific time has influenced the difficulty. In this work, we employed
Machine Learning Applications. Accordingly, we used the Direct Deep Neural Network (DNN)-based approach, in
which the posterior opportunities of the output layer are utilized to determine the speaker’s presence. In line with
the small footprint design objective, a simple DNN model with only sufficient hidden units or sufficient hidden
units per layer was designed, thereby reducing the cost of parameters through intentional preparation to avoid the
normal overfitting problem and optimize the algorithmic aspects, such as context-based training, activation functions,
validation, and learning rate. Two commercially available databases, namely, TIMIT clean speech and HTIMIT multihandset communication database and TIMIT noise-added data framework, were tested for this reference model that
we developed using four sound categories at three distinct signal-to-noise ratios. Briefly, we used a dynamic pruning
method in which the conditions of all layers are simultaneously pruned, and the pruning mechanism is reassigned.
The usefulness of this approach was evaluated on all the above contact databases