M. Neuwirth, Daniel Harasim, Fabian C. Moss, M. Rohrmeier
{"title":"The Annotated Beethoven Corpus (ABC): A Dataset of Harmonic Analyses of All Beethoven String Quartets","authors":"M. Neuwirth, Daniel Harasim, Fabian C. Moss, M. Rohrmeier","doi":"10.3389/fdigh.2018.00016","DOIUrl":null,"url":null,"abstract":"This report describes a publicly available dataset of harmonic analyses of all Beethoven string quartets together with a new annotation scheme. The quantitative study of large datasets is gaining increasing importance in musicology, reflecting a global trend toward empirical corpus studies and big data methods in the sciences as well as the (digital) humanities. Several initiatives and publications exemplify these new developments (e.g., Mauch et al., 2007; Rohrmeier and Cross, 2008; Temperley, 2009; De Clercq and Temperley, 2011; Schubert and Cumming, 2015; Klauk and Zalkow, 2016; White and Quinn, 2016). Ever increasing digital music resources are available online in the form of large collections of audio recordings,1 scanned scores,2 or MIDI files.3 Furthermore, musicologists have produced collections of symbolic and audio music repositories, e.g., the Essen Folksong collection (Schaffrath, 1995), the score collection in Humdrum/KERN format4 (Huron, 1997; Sapp, 2014), and the corpora of audio resources of Non-Western classical music traditions gathered by the CompMusic project5 (Serra, 2014). However, raw audio or symbolic musical information is often insufficient to investigate more abstract structural properties of musical styles, such as harmony, counterpoint, or form. Sufficiently sophisticated and statistically fully reliable automated Music Information Retrieval (MIR) methods for structural inference are not yet available. Despite the availability of raw audio material and the recent research initiatives mentioned above, digital musicology still lacks large labeled corpora combining score and harmonic annotations. These corpora are necessary as ground truth data for the minute investigation of structural dimensions of music such as harmony. As we elaborate below, our research addresses this gap by providing a large dataset of expert-generated harmonic labels in the stylistically coherent corpus of Ludwig van Beethoven’s string quartets, the Annotated Beethoven Corpus (ABC). This corpus will be useful for the research purposes of empirical and digital musicology, such as deepening the understanding of musical syntax, voice-leading schemata, form, and style, as well as for the development and evaluation of computational models of harmony and musical structure in general.","PeriodicalId":227954,"journal":{"name":"Frontiers Digit. Humanit.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers Digit. Humanit.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdigh.2018.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 55
Abstract
This report describes a publicly available dataset of harmonic analyses of all Beethoven string quartets together with a new annotation scheme. The quantitative study of large datasets is gaining increasing importance in musicology, reflecting a global trend toward empirical corpus studies and big data methods in the sciences as well as the (digital) humanities. Several initiatives and publications exemplify these new developments (e.g., Mauch et al., 2007; Rohrmeier and Cross, 2008; Temperley, 2009; De Clercq and Temperley, 2011; Schubert and Cumming, 2015; Klauk and Zalkow, 2016; White and Quinn, 2016). Ever increasing digital music resources are available online in the form of large collections of audio recordings,1 scanned scores,2 or MIDI files.3 Furthermore, musicologists have produced collections of symbolic and audio music repositories, e.g., the Essen Folksong collection (Schaffrath, 1995), the score collection in Humdrum/KERN format4 (Huron, 1997; Sapp, 2014), and the corpora of audio resources of Non-Western classical music traditions gathered by the CompMusic project5 (Serra, 2014). However, raw audio or symbolic musical information is often insufficient to investigate more abstract structural properties of musical styles, such as harmony, counterpoint, or form. Sufficiently sophisticated and statistically fully reliable automated Music Information Retrieval (MIR) methods for structural inference are not yet available. Despite the availability of raw audio material and the recent research initiatives mentioned above, digital musicology still lacks large labeled corpora combining score and harmonic annotations. These corpora are necessary as ground truth data for the minute investigation of structural dimensions of music such as harmony. As we elaborate below, our research addresses this gap by providing a large dataset of expert-generated harmonic labels in the stylistically coherent corpus of Ludwig van Beethoven’s string quartets, the Annotated Beethoven Corpus (ABC). This corpus will be useful for the research purposes of empirical and digital musicology, such as deepening the understanding of musical syntax, voice-leading schemata, form, and style, as well as for the development and evaluation of computational models of harmony and musical structure in general.
本报告描述了所有贝多芬弦乐四重奏和声分析的公开可用数据集,以及一个新的注释方案。大数据集的定量研究在音乐学中越来越重要,反映了科学以及(数字)人文学科中经验语料库研究和大数据方法的全球趋势。一些倡议和出版物例证了这些新的发展(例如,Mauch等人,2007;Rohrmeier and Cross, 2008;坦,2009;De Clercq and Temperley, 2011;舒伯特和卡明,2015;Klauk and Zalkow, 2016;White and Quinn, 2016)。越来越多的数字音乐资源以大量录音,1扫描乐谱,2或MIDI文件的形式在线提供此外,音乐学家还制作了符号和音频音乐存储库的集合,例如埃森民歌集合(Schaffrath, 1995), Humdrum/KERN格式的乐谱集合(Huron, 1997;Sapp, 2014),以及CompMusic项目收集的非西方古典音乐传统音频资源语料库(Serra, 2014)。然而,原始音频或符号音乐信息通常不足以研究音乐风格的更抽象的结构属性,如和声、对位或形式。目前还没有足够复杂和统计上完全可靠的自动音乐信息检索(MIR)方法用于结构推理。尽管原始音频材料的可用性和最近提到的研究计划,数字音乐学仍然缺乏结合乐谱和和声注释的大型标记语料库。这些语料库对于音乐结构维度(如和声)的细致研究是必要的基础真实数据。正如我们在下面阐述的那样,我们的研究通过在路德维希·范·贝多芬弦乐四重奏的风格连贯语料库中提供专家生成的和声标签的大型数据集来解决这一差距,即贝多芬注释语料库(ABC)。该语料库将有助于实证和数字音乐学的研究目的,例如加深对音乐语法、声音引导图式、形式和风格的理解,以及对和声和音乐结构的计算模型的开发和评估。