比较韵律框架:ToBI和RaP的声-符号关系研究

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI:10.1109/SLT.2018.8639539

Raul Fernandez, A. Rosenberg

{"title":"比较韵律框架:ToBI和RaP的声-符号关系研究","authors":"Raul Fernandez, A. Rosenberg","doi":"10.1109/SLT.2018.8639539","DOIUrl":null,"url":null,"abstract":"ToBI is the dominant tool for symbolically describing prosodic content in American English speech material. This is due to its descriptive power and its theoretical grounding, but also to the amount of available annotated data. Recently, a modest amount of material annotated with the Rhythm and Pitch (RaP) framework was released publicly. In this paper, we investigate the acoustic-symbolic relationship under these two systems. We present experiments looking at this relationship in both directions. From acoustic to symbolic, we compare the automatic prediction of prosodic prominence as defined under the two systems. From symbolic to acoustic, we examine the utility of these annotation standards to correctly prescribe the acoustics of a given utterance from their symbolic sequences. We find RaP to be promising, showing a somewhat stronger acoustic-symbolic relationship than ToBI given a comparable amount of data for some aspects of these tasks. While with more annotated data ToBI results are stronger, it remains to be shown whether RaP performance can scale up.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing Prosodic Frameworks: Investigating the Acoustic-Symbolic Relationship in ToBI and RaP\",\"authors\":\"Raul Fernandez, A. Rosenberg\",\"doi\":\"10.1109/SLT.2018.8639539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ToBI is the dominant tool for symbolically describing prosodic content in American English speech material. This is due to its descriptive power and its theoretical grounding, but also to the amount of available annotated data. Recently, a modest amount of material annotated with the Rhythm and Pitch (RaP) framework was released publicly. In this paper, we investigate the acoustic-symbolic relationship under these two systems. We present experiments looking at this relationship in both directions. From acoustic to symbolic, we compare the automatic prediction of prosodic prominence as defined under the two systems. From symbolic to acoustic, we examine the utility of these annotation standards to correctly prescribe the acoustics of a given utterance from their symbolic sequences. We find RaP to be promising, showing a somewhat stronger acoustic-symbolic relationship than ToBI given a comparable amount of data for some aspects of these tasks. While with more annotated data ToBI results are stronger, it remains to be shown whether RaP performance can scale up.\",\"PeriodicalId\":377307,\"journal\":{\"name\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"106 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2018.8639539\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

ToBI是美国英语语音材料中对韵律内容进行符号化描述的主要工具。这是由于它的描述能力和理论基础，但也是由于可用的注释数据的数量。最近，公开发布了少量带有节奏和音高(RaP)框架注释的材料。本文研究了这两种系统下的声符号关系。我们提出了从两个方向来观察这种关系的实验。从声学到符号，我们比较了两种系统下定义的韵律突出的自动预测。从符号到声学，我们研究了这些注释标准的效用，以正确地从它们的符号序列中规定给定话语的声学。我们发现RaP很有希望，在这些任务的某些方面给出相当数量的数据时，RaP比ToBI显示出更强的声学-符号关系。虽然有更多注释数据的ToBI结果更强，但RaP性能是否可以扩展仍有待证明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparing Prosodic Frameworks: Investigating the Acoustic-Symbolic Relationship in ToBI and RaP

ToBI is the dominant tool for symbolically describing prosodic content in American English speech material. This is due to its descriptive power and its theoretical grounding, but also to the amount of available annotated data. Recently, a modest amount of material annotated with the Rhythm and Pitch (RaP) framework was released publicly. In this paper, we investigate the acoustic-symbolic relationship under these two systems. We present experiments looking at this relationship in both directions. From acoustic to symbolic, we compare the automatic prediction of prosodic prominence as defined under the two systems. From symbolic to acoustic, we examine the utility of these annotation standards to correctly prescribe the acoustics of a given utterance from their symbolic sequences. We find RaP to be promising, showing a somewhat stronger acoustic-symbolic relationship than ToBI given a comparable amount of data for some aspects of these tasks. While with more annotated data ToBI results are stronger, it remains to be shown whether RaP performance can scale up.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量