预测复调音频的键识别困难

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) Pub Date : 2013-12-09 DOI:10.1109/ISM.2013.82

C. Chuan, Aleksey Charapko

{"title":"预测复调音频的键识别困难","authors":"C. Chuan, Aleksey Charapko","doi":"10.1109/ISM.2013.82","DOIUrl":null,"url":null,"abstract":"In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"3 1","pages":"421-426"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Predicting Key Recognition Difficulty in Polyphonic Audio\",\"authors\":\"C. Chuan, Aleksey Charapko\",\"doi\":\"10.1109/ISM.2013.82\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.\",\"PeriodicalId\":6311,\"journal\":{\"name\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"volume\":\"3 1\",\"pages\":\"421-426\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2013.82\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2013.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们提出了统计模型来预测从复调音频信号中识别音乐键的难度。音频键的自动查找已经进行了多年的研究，提出并报道了各种方法。这些方法的性能报告通常是基于提出者自己的数据集。如果没有数据集的细节，即数据集的挑战性如何，直接比较这些方法的有效性是没有意义的，甚至是不可能的。因此，在本研究中，我们专注于预测人类专家感知到的关键识别的难度水平。给定一个录音，表示为提取的声学特征，我们应用多元线性回归和比例几率模型来预测录音的难度水平，专家在5点李克特量表上以整数形式注释。我们使用四个指标来评估我们的预测结果:均方根误差、Pearson相关系数、精确精度和相邻精度。我们还检查了专家注释之间的差异，并讨论了它们的一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predicting Key Recognition Difficulty in Polyphonic Audio

In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

自引率

0.00%

发文量