复调音乐的统计建模与检索

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI:10.1109/MMSP.2007.4412902

E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew

{"title":"复调音乐的统计建模与检索","authors":"E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew","doi":"10.1109/MMSP.2007.4412902","DOIUrl":null,"url":null,"abstract":"In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Statistical Modeling and Retrieval of Polyphonic Music\",\"authors\":\"E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew\",\"doi\":\"10.1109/MMSP.2007.4412902\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.\",\"PeriodicalId\":225295,\"journal\":{\"name\":\"2007 IEEE 9th Workshop on Multimedia Signal Processing\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE 9th Workshop on Multimedia Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMSP.2007.4412902\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE 9th Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2007.4412902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

本文以复调音乐音频为例，提出了一种解决复调音乐音频查询问题的方法。我们首先为音频查询提供一个通用的中级表示。与文献中先前的努力不同，所提出的表示不依赖于不同乐器的不同频谱特征以及音符开始和偏移的准确位置。这是通过首先将连续音频帧的短期频谱映射到音乐空间(螺旋阵列)，并根据由音符的频谱权重产生的效果中心定义音调身份来实现的。然后，我们使用音频的单维文本表示来创建a图统计序列模型，以跟踪音调特征和片段的行为。在进行适当的平滑之后，我们构建了一个旋律n-gram模型的集合来进行测试。使用基于困惑度的评分，我们测试给定数据库集合中每个模型的词法和弦序列(音频查询)的可能性。初步结果表明，在500个复调旋律数据库中，对于整个旋律输入，输入片段的一些变化出现在前5个结果中有81%的时间。我们还测试了小音频片段的检索引擎。使用25s片段，输入片段的变化在75%的时间内位于前5个结果之列。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Statistical Modeling and Retrieval of Polyphonic Music

In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 IEEE 9th Workshop on Multimedia Signal Processing

自引率

0.00%

发文量