双任务单音歌唱转录

IF 1.1 4区 工程技术 Q3 ACOUSTICS
Markus Schwabe, Sebastian Murgul, M. Heizmann
{"title":"双任务单音歌唱转录","authors":"Markus Schwabe, Sebastian Murgul, M. Heizmann","doi":"10.17743/jaes.2022.0040","DOIUrl":null,"url":null,"abstract":"Automatic music transcription with note level output is a current task in the field of music information retrieval. In contrast to the piano case with very good results using available large datasets, transcription of non-professional singing has been rarely investigated with deep learning approaches because of the lack of note level annotated datasets. In this work, two datasets are created concerning amateur singing recordings, one for training (synthetic singing dataset) and one for the evaluation task (SingReal dataset). The synthetic training dataset is generated by synthesizing a large scale of vocal melodies from artificial songs. Because the evaluation should represent a realistic scenario, the SingReal dataset is created from real recordings of non-professional singers. To transcribe singing notes, a new method called Dual Task Monophonic Singing Transcription is proposed, which divides the problem of singing transcription into the two subtasks onset detection and pitch estimation, realized by two small independent neural networks. This approach achieves a note level F1 score of 74.19% on the SingReal dataset, outperforming all state of the art transcription systems investigated with at least 3.5% improvement. Furthermore, Dual Task Monophonic Singing Transcription can be adapted very easily to the real-time transcription case.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual Task Monophonic Singing Transcription\",\"authors\":\"Markus Schwabe, Sebastian Murgul, M. Heizmann\",\"doi\":\"10.17743/jaes.2022.0040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic music transcription with note level output is a current task in the field of music information retrieval. In contrast to the piano case with very good results using available large datasets, transcription of non-professional singing has been rarely investigated with deep learning approaches because of the lack of note level annotated datasets. In this work, two datasets are created concerning amateur singing recordings, one for training (synthetic singing dataset) and one for the evaluation task (SingReal dataset). The synthetic training dataset is generated by synthesizing a large scale of vocal melodies from artificial songs. Because the evaluation should represent a realistic scenario, the SingReal dataset is created from real recordings of non-professional singers. To transcribe singing notes, a new method called Dual Task Monophonic Singing Transcription is proposed, which divides the problem of singing transcription into the two subtasks onset detection and pitch estimation, realized by two small independent neural networks. This approach achieves a note level F1 score of 74.19% on the SingReal dataset, outperforming all state of the art transcription systems investigated with at least 3.5% improvement. Furthermore, Dual Task Monophonic Singing Transcription can be adapted very easily to the real-time transcription case.\",\"PeriodicalId\":50008,\"journal\":{\"name\":\"Journal of the Audio Engineering Society\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Audio Engineering Society\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.17743/jaes.2022.0040\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Audio Engineering Society","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.17743/jaes.2022.0040","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

具有音符级输出的自动音乐转录是音乐信息检索领域中的一项当前任务。与使用可用的大型数据集获得非常好结果的钢琴案例相比,由于缺乏音符级注释数据集,很少使用深度学习方法研究非专业歌唱的转录。在这项工作中,创建了两个关于业余歌唱记录的数据集,一个用于训练(合成歌唱数据集),另一个用于评估任务(SingReal数据集)。合成训练数据集是通过从人工歌曲中合成大规模的声乐旋律来生成的。因为评估应该代表一个现实的场景,所以SingReal数据集是根据非专业歌手的真实录音创建的。为了转录歌唱音符,提出了一种新的方法,称为双任务单音歌唱转录,该方法将歌唱转录问题分为两个子任务起始检测和音高估计,由两个小型独立神经网络实现。这种方法在SingReal数据集上获得了74.19%的音符级F1分数,优于所研究的所有最先进的转录系统,至少提高了3.5%。此外,双任务单音歌唱转录可以很容易地适应实时转录的情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dual Task Monophonic Singing Transcription
Automatic music transcription with note level output is a current task in the field of music information retrieval. In contrast to the piano case with very good results using available large datasets, transcription of non-professional singing has been rarely investigated with deep learning approaches because of the lack of note level annotated datasets. In this work, two datasets are created concerning amateur singing recordings, one for training (synthetic singing dataset) and one for the evaluation task (SingReal dataset). The synthetic training dataset is generated by synthesizing a large scale of vocal melodies from artificial songs. Because the evaluation should represent a realistic scenario, the SingReal dataset is created from real recordings of non-professional singers. To transcribe singing notes, a new method called Dual Task Monophonic Singing Transcription is proposed, which divides the problem of singing transcription into the two subtasks onset detection and pitch estimation, realized by two small independent neural networks. This approach achieves a note level F1 score of 74.19% on the SingReal dataset, outperforming all state of the art transcription systems investigated with at least 3.5% improvement. Furthermore, Dual Task Monophonic Singing Transcription can be adapted very easily to the real-time transcription case.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of the Audio Engineering Society
Journal of the Audio Engineering Society 工程技术-工程:综合
CiteScore
3.50
自引率
14.30%
发文量
53
审稿时长
1 months
期刊介绍: The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers. The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信