Electroglottography based voice-to-MIDI real time converter with AI voice act classification

E. Donati, Christos Chousidis
{"title":"Electroglottography based voice-to-MIDI real time converter with AI voice act classification","authors":"E. Donati, Christos Chousidis","doi":"10.1109/MeMeA54994.2022.9856413","DOIUrl":null,"url":null,"abstract":"Voice-to-MIDI real-time conversion is a challenging task that presents a series of obstacles and complications. The main issue is the tracking of the pitch. The frequency tracking of human voice can be inaccurate and computationally expensive due to spectral complexity of voice sounds. Moreover, with microphone-based systems, the presence of environmental noise and neighbouring sounds further affect the accuracy of the frequency tracking. Another issue with the conversion of voice into MIDI, is the presence of non-singing phonemes. As every sound picked up by the microphone would go through the conversion system, any voice or sounded phonemes produced by the user will result in a MIDI output. This research addresses such issues by applying a novel experimental method which employs electroglottography, known to the medical community as EGG, as a source for the pitch tracking operation. Electroglottography improves both the accuracy of the tracking and the ease of processing as it delivers a direct evaluation of the vocal folds operation whilst bypassing any contamination from other sound sources. Furthermore, to address the issue of non-singing phonemes, the proposed method employs the use of neural networks for a real-time classification of the vocal act produced by the user.","PeriodicalId":106228,"journal":{"name":"2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA54994.2022.9856413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Voice-to-MIDI real-time conversion is a challenging task that presents a series of obstacles and complications. The main issue is the tracking of the pitch. The frequency tracking of human voice can be inaccurate and computationally expensive due to spectral complexity of voice sounds. Moreover, with microphone-based systems, the presence of environmental noise and neighbouring sounds further affect the accuracy of the frequency tracking. Another issue with the conversion of voice into MIDI, is the presence of non-singing phonemes. As every sound picked up by the microphone would go through the conversion system, any voice or sounded phonemes produced by the user will result in a MIDI output. This research addresses such issues by applying a novel experimental method which employs electroglottography, known to the medical community as EGG, as a source for the pitch tracking operation. Electroglottography improves both the accuracy of the tracking and the ease of processing as it delivers a direct evaluation of the vocal folds operation whilst bypassing any contamination from other sound sources. Furthermore, to address the issue of non-singing phonemes, the proposed method employs the use of neural networks for a real-time classification of the vocal act produced by the user.
基于电声门图的语音- midi实时转换器与人工智能语音行为分类
语音到midi的实时转换是一项具有挑战性的任务,存在一系列障碍和复杂性。主要的问题是对球的跟踪。由于人声频谱的复杂性,人声的频率跟踪是不准确的,而且计算成本很高。此外,对于基于麦克风的系统,环境噪声和邻近声音的存在进一步影响频率跟踪的准确性。将声音转换为MIDI的另一个问题是存在非歌唱音素。由于麦克风接收到的每一个声音都会经过转换系统,因此用户产生的任何声音或声音音素都会产生MIDI输出。本研究通过应用一种新颖的实验方法来解决这些问题,该方法采用电声门图,医学界称为EGG,作为音高跟踪操作的来源。电声门图提高了跟踪的准确性和处理的便利性,因为它提供了声带操作的直接评估,同时绕过了任何来自其他声源的污染。此外,为了解决非歌唱音素的问题,该方法采用神经网络对用户产生的声乐行为进行实时分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信