{"title":"Music modeling with random fields","authors":"V. Lavrenko, Jeremy Pickens","doi":"10.1145/860435.860515","DOIUrl":null,"url":null,"abstract":"Recent interest in the area of music information retrieval is exploding. However, very few of the existing music retrieval techniques take advantage of recent developments in statistical modeling. In this report we discuss an application of Random Fields to the problem of statistical modeling of polyphonic music. With such models in hand, the challenges of developing effective searching, browsing, and organization techniques for the growing bodies of music collections may be successfully met. 1 Polyphonic music can be thought of as a two-dimensional stochastic process. Unlike text, the musical vocabulary is relatively small, containing at most several hundred discrete note symbols. What makes music so fascinating and expressive is the very rich structure inherent in musical pieces. Whereas text samples can be reasonably modeled using simple unigram or bi-gram language models, polyphonic music is characterized by numerous periodic symmetries, repetitions, and overlapping shortand long-term interactions that are beyond the capabilities of simple Markov chains. Random Fields are a generalization of Markov chains to multidimensional spatial processes. They are incredibly flexible, allowing us to model arbitrary interactions between elements of data. Recently random fields have found applications in large-vocabulary tasks, such as language modeling and information extraction. One of the most influential works in the area is the 1997 publication of Della Pietra et al. [2], which outlined the algorithms used in parts of this paper. Berger et al. [1] were the first to suggest the use of maximum entropy models for natural language processing. While our work was inspired by applications of random fields to language processing, it bears more similarity to the use of the framework by the researchers in computer vision. In most natural language applications authors start with a reasonable set of features (which are usually single words, or hand-crafted expressions), and the main challenge is to optimize the weights corresponding to these features. This works well in natural language, where words bear significant semantic content. In our case, induction of the random field is the crucial step. We will use the techniques suggested by [2] to automatically induce new high-level, salient features, such as chords and melodic progressions.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"148 Pt 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Recent interest in the area of music information retrieval is exploding. However, very few of the existing music retrieval techniques take advantage of recent developments in statistical modeling. In this report we discuss an application of Random Fields to the problem of statistical modeling of polyphonic music. With such models in hand, the challenges of developing effective searching, browsing, and organization techniques for the growing bodies of music collections may be successfully met. 1 Polyphonic music can be thought of as a two-dimensional stochastic process. Unlike text, the musical vocabulary is relatively small, containing at most several hundred discrete note symbols. What makes music so fascinating and expressive is the very rich structure inherent in musical pieces. Whereas text samples can be reasonably modeled using simple unigram or bi-gram language models, polyphonic music is characterized by numerous periodic symmetries, repetitions, and overlapping shortand long-term interactions that are beyond the capabilities of simple Markov chains. Random Fields are a generalization of Markov chains to multidimensional spatial processes. They are incredibly flexible, allowing us to model arbitrary interactions between elements of data. Recently random fields have found applications in large-vocabulary tasks, such as language modeling and information extraction. One of the most influential works in the area is the 1997 publication of Della Pietra et al. [2], which outlined the algorithms used in parts of this paper. Berger et al. [1] were the first to suggest the use of maximum entropy models for natural language processing. While our work was inspired by applications of random fields to language processing, it bears more similarity to the use of the framework by the researchers in computer vision. In most natural language applications authors start with a reasonable set of features (which are usually single words, or hand-crafted expressions), and the main challenge is to optimize the weights corresponding to these features. This works well in natural language, where words bear significant semantic content. In our case, induction of the random field is the crucial step. We will use the techniques suggested by [2] to automatically induce new high-level, salient features, such as chords and melodic progressions.
最近,人们对音乐信息检索领域的兴趣正呈爆炸式增长。然而,很少有现有的音乐检索技术利用了统计建模的最新发展。本文讨论了随机场在复调音乐统计建模中的应用。有了这样的模型,为不断增长的音乐收藏开发有效的搜索、浏览和组织技术的挑战可能会成功地得到满足。复调音乐可以被认为是一个二维随机过程。与文本不同,音乐词汇相对较少,最多只包含几百个离散的音符符号。使音乐如此迷人和富有表现力的是乐曲中固有的非常丰富的结构。虽然文本样本可以使用简单的单字母或双字母语言模型合理地建模,但复调音乐的特点是具有许多周期性对称、重复和重叠的短期和长期相互作用,这些都超出了简单马尔可夫链的能力。随机场是马尔可夫链在多维空间过程中的推广。它们非常灵活,允许我们对数据元素之间的任意交互进行建模。近年来,随机场在语言建模和信息提取等大词汇量任务中得到了应用。该领域最具影响力的著作之一是1997年出版的Della Pietra et al. b[2],其中概述了本文部分使用的算法。Berger等人是第一个建议使用最大熵模型进行自然语言处理的人。虽然我们的工作受到随机场在语言处理中的应用的启发,但它与研究人员在计算机视觉中使用的框架更相似。在大多数自然语言应用程序中,作者从一组合理的特征(通常是单个单词或手工制作的表达式)开始,主要的挑战是优化与这些特征相对应的权重。这在自然语言中效果很好,因为自然语言中的单词具有重要的语义内容。在我们的例子中,诱导随机场是关键的一步。我们将使用[2]建议的技术来自动诱导新的高层次,突出的特征,如和弦和旋律进展。