{"title":"JNV语料库:日语非语言语料库,包含多种短语和情感","authors":"Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari","doi":"10.1016/j.specom.2023.103004","DOIUrl":null,"url":null,"abstract":"<div><p>We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of Japanese nonverbal vocalizations (NVs) with diverse phrases and emotions. Existing Japanese NV corpora either lack phrase diversity or focus on a small number of emotions, which makes it difficult to analyze the characteristics of Japanese NVs and support downstream tasks like emotion recognition. We first propose a corpus-design method that contains two phases: (1) collecting NVs phrases based on crowd-sourcing; (2) recording NVs by stimulating speakers with emotional scenarios. We then collect 420 audio clips from 4 speakers that cover 6 emotions based on the proposed method. Results of comprehensive objective and subjective experiments demonstrate that (1) the emotions of the collected NVs can be recognized with high accuracy by both human evaluators and statistical models; (2) the collected NVs have a high authenticity comparable to previous corpora of English NVs. Additionally, we analyze the distributions of vowel types in Japanese and conduct feature importance analysis to show discriminative acoustic features between emotion categories in Japanese NVs. We publicate JNV to advance further development in this field.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"156 ","pages":"Article 103004"},"PeriodicalIF":2.4000,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167639323001383/pdfft?md5=a483e24acbbf292a674e285ddd58df8a&pid=1-s2.0-S0167639323001383-main.pdf","citationCount":"0","resultStr":"{\"title\":\"JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions\",\"authors\":\"Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari\",\"doi\":\"10.1016/j.specom.2023.103004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of Japanese nonverbal vocalizations (NVs) with diverse phrases and emotions. Existing Japanese NV corpora either lack phrase diversity or focus on a small number of emotions, which makes it difficult to analyze the characteristics of Japanese NVs and support downstream tasks like emotion recognition. We first propose a corpus-design method that contains two phases: (1) collecting NVs phrases based on crowd-sourcing; (2) recording NVs by stimulating speakers with emotional scenarios. We then collect 420 audio clips from 4 speakers that cover 6 emotions based on the proposed method. Results of comprehensive objective and subjective experiments demonstrate that (1) the emotions of the collected NVs can be recognized with high accuracy by both human evaluators and statistical models; (2) the collected NVs have a high authenticity comparable to previous corpora of English NVs. Additionally, we analyze the distributions of vowel types in Japanese and conduct feature importance analysis to show discriminative acoustic features between emotion categories in Japanese NVs. We publicate JNV to advance further development in this field.</p></div>\",\"PeriodicalId\":49485,\"journal\":{\"name\":\"Speech Communication\",\"volume\":\"156 \",\"pages\":\"Article 103004\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0167639323001383/pdfft?md5=a483e24acbbf292a674e285ddd58df8a&pid=1-s2.0-S0167639323001383-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Communication\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167639323001383\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639323001383","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions
We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of Japanese nonverbal vocalizations (NVs) with diverse phrases and emotions. Existing Japanese NV corpora either lack phrase diversity or focus on a small number of emotions, which makes it difficult to analyze the characteristics of Japanese NVs and support downstream tasks like emotion recognition. We first propose a corpus-design method that contains two phases: (1) collecting NVs phrases based on crowd-sourcing; (2) recording NVs by stimulating speakers with emotional scenarios. We then collect 420 audio clips from 4 speakers that cover 6 emotions based on the proposed method. Results of comprehensive objective and subjective experiments demonstrate that (1) the emotions of the collected NVs can be recognized with high accuracy by both human evaluators and statistical models; (2) the collected NVs have a high authenticity comparable to previous corpora of English NVs. Additionally, we analyze the distributions of vowel types in Japanese and conduct feature importance analysis to show discriminative acoustic features between emotion categories in Japanese NVs. We publicate JNV to advance further development in this field.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.