METHODS OF IMPROVING THE QUALITY OF SPEECH-TO-TEXT CONVERSION IN BIOMETRIC AUTHENTICATION SYSTEMS

Інфокомунікаційні та комп’ютерні технології Pub Date : 2023-01-01 DOI:10.36994/2788-5518-2023-01-05-13

V. Korchynskyi, S. Staikuca, I. Vynogradov, O. Shvets, I. Bielova

{"title":"METHODS OF IMPROVING THE QUALITY OF SPEECH-TO-TEXT CONVERSION IN BIOMETRIC AUTHENTICATION SYSTEMS","authors":"V. Korchynskyi, S. Staikuca, I. Vynogradov, O. Shvets, I. Bielova","doi":"10.36994/2788-5518-2023-01-05-13","DOIUrl":null,"url":null,"abstract":"The article discusses the methods and algorithms of speech-to-text conversion, modern open and commercial systems for creating systems, as well as the use of these technologies in the field of cyber security. It is proposed to create a high-quality speech-to-text conversion system. An analysis of the mathematical algorithms used to reduce the error rate, which makes it possible to create unique voice prints and increase protection against forgery, has been carried out. The structure of modern speech-to-text conversion systems is described. By changing datasets, parameters of hidden Markov models, a high-quality dictionary of phonemes, and the use of language models, there is an opportunity to reduce the percentage of errors in language recognition, as well as the use of a system for multilingualism such as \"surzhyk\". The mathematical methods of assessing the quality of the system of speech to text (WER), as well as various methods of calculation, which is important for their further improvement and optimization, are considered. The structure of modern systems is considered, namely, signal pre-processing, feature extraction, acoustic modeling, speech modeling, decoding, post-processing. For each of the stages, study vectors have been proposed that can reduce the error rate of the system as a whole. Reducing speech recognition errors and the ability to fake a voice is achieved using various methods: deep neural networks, hidden Markov models, Baum-Welch algorithm, N-gram models, models with attention, creation of a high-quality phonemes dictionary, dataset, and fillers. Speech-to-text conversion technology can be used in biometric authentication systems to detect and analyze the unique features of the user's voice. However, modern speech-to-text conversion systems for Ukrainian, Russian, and \"surzhyk\" need improvement in acoustic and language units. Scientific works, which are devoted to research and optimization of these systems for biometric authentication, do not fully cover these issues. This became the reason for further research in this direction, so this work aims to create a speech recognition system with a minimum error rate.","PeriodicalId":165726,"journal":{"name":"Інфокомунікаційні та комп’ютерні технології","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Інфокомунікаційні та комп’ютерні технології","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36994/2788-5518-2023-01-05-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The article discusses the methods and algorithms of speech-to-text conversion, modern open and commercial systems for creating systems, as well as the use of these technologies in the field of cyber security. It is proposed to create a high-quality speech-to-text conversion system. An analysis of the mathematical algorithms used to reduce the error rate, which makes it possible to create unique voice prints and increase protection against forgery, has been carried out. The structure of modern speech-to-text conversion systems is described. By changing datasets, parameters of hidden Markov models, a high-quality dictionary of phonemes, and the use of language models, there is an opportunity to reduce the percentage of errors in language recognition, as well as the use of a system for multilingualism such as "surzhyk". The mathematical methods of assessing the quality of the system of speech to text (WER), as well as various methods of calculation, which is important for their further improvement and optimization, are considered. The structure of modern systems is considered, namely, signal pre-processing, feature extraction, acoustic modeling, speech modeling, decoding, post-processing. For each of the stages, study vectors have been proposed that can reduce the error rate of the system as a whole. Reducing speech recognition errors and the ability to fake a voice is achieved using various methods: deep neural networks, hidden Markov models, Baum-Welch algorithm, N-gram models, models with attention, creation of a high-quality phonemes dictionary, dataset, and fillers. Speech-to-text conversion technology can be used in biometric authentication systems to detect and analyze the unique features of the user's voice. However, modern speech-to-text conversion systems for Ukrainian, Russian, and "surzhyk" need improvement in acoustic and language units. Scientific works, which are devoted to research and optimization of these systems for biometric authentication, do not fully cover these issues. This became the reason for further research in this direction, so this work aims to create a speech recognition system with a minimum error rate.

查看原文本刊更多论文

生物识别认证系统中提高语音到文本转换质量的方法

本文讨论了语音到文本转换的方法和算法，现代开放和商业化的系统创建系统，以及这些技术在网络安全领域的应用。提出了一个高质量的语音到文本转换系统。为了减少错误率，可以创造出独特的声纹，并提高防止伪造的能力，对数学算法进行了分析。描述了现代语音到文本转换系统的结构。通过改变数据集、隐马尔可夫模型的参数、高质量的音素词典和语言模型的使用，有机会减少语言识别中的错误率，以及使用多语言系统，如“surzhyk”。讨论了语音转文本系统质量评价的数学方法，以及各种计算方法，这对语音转文本系统的进一步改进和优化具有重要意义。考虑现代系统的结构，即信号预处理、特征提取、声学建模、语音建模、解码、后处理。对于每个阶段，都提出了可以降低系统整体错误率的研究向量。减少语音识别错误和伪造声音的能力是通过各种方法实现的:深度神经网络、隐马尔可夫模型、鲍姆-韦尔奇算法、N-gram模型、注意模型、创建高质量的音素字典、数据集和填充器。语音到文本转换技术可用于生物识别认证系统，以检测和分析用户声音的独特特征。然而，乌克兰语、俄语和“苏尔日克语”的现代语音到文本转换系统在声学和语言单位方面需要改进。致力于研究和优化这些生物识别认证系统的科学著作并没有完全涵盖这些问题。这就成为了在这个方向上进一步研究的原因，所以本工作的目标是创建一个错误率最小的语音识别系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Інфокомунікаційні та комп’ютерні технології

自引率

0.00%

发文量