预测复调音频的键识别困难

C. Chuan, Aleksey Charapko
{"title":"预测复调音频的键识别困难","authors":"C. Chuan, Aleksey Charapko","doi":"10.1109/ISM.2013.82","DOIUrl":null,"url":null,"abstract":"In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"3 1","pages":"421-426"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Predicting Key Recognition Difficulty in Polyphonic Audio\",\"authors\":\"C. Chuan, Aleksey Charapko\",\"doi\":\"10.1109/ISM.2013.82\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.\",\"PeriodicalId\":6311,\"journal\":{\"name\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"volume\":\"3 1\",\"pages\":\"421-426\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2013.82\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2013.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在本文中,我们提出了统计模型来预测从复调音频信号中识别音乐键的难度。音频键的自动查找已经进行了多年的研究,提出并报道了各种方法。这些方法的性能报告通常是基于提出者自己的数据集。如果没有数据集的细节,即数据集的挑战性如何,直接比较这些方法的有效性是没有意义的,甚至是不可能的。因此,在本研究中,我们专注于预测人类专家感知到的关键识别的难度水平。给定一个录音,表示为提取的声学特征,我们应用多元线性回归和比例几率模型来预测录音的难度水平,专家在5点李克特量表上以整数形式注释。我们使用四个指标来评估我们的预测结果:均方根误差、Pearson相关系数、精确精度和相邻精度。我们还检查了专家注释之间的差异,并讨论了它们的一致性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting Key Recognition Difficulty in Polyphonic Audio
In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信