一个卡斯蒂利亚语的帕金森氏语言语料库。

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data Pub Date : 2024-12-18 DOI:10.1038/s41597-024-04186-z

Janaína Mendes-Laureano, Jorge A Gómez-García, Alejandro Guerrero-López, Elisa Luque-Buzo, Julián D Arias-Londoño, Francisco J Grandas-Pérez, Juan I Godino-Llorente

{"title":"一个卡斯蒂利亚语的帕金森氏语言语料库。","authors":"Janaína Mendes-Laureano, Jorge A Gómez-García, Alejandro Guerrero-López, Elisa Luque-Buzo, Julián D Arias-Londoño, Francisco J Grandas-Pérez, Juan I Godino-Llorente","doi":"10.1038/s41597-024-04186-z","DOIUrl":null,"url":null,"abstract":"The screening of Parkinson's Disease (PD) through speech is hindered by a notable lack of publicly available datasets in different languages. This fact limits the reproducibility and further exploration of existing research. To address this gap, this manuscript presents the NeuroVoz corpus consisting of 112 native Castilian-Spanish speakers, including 58 healthy controls and 54 individuals with PD, all recorded in ON state. The corpus showcases a diverse array of speech tasks: sustained vowels; diadochokinetic tests; 16 Listen-and-Repeat utterances; and spontaneous monologues. The dataset is also complemented with subjective assessments of voice quality performed by an expert according to the GRBAS scale (Grade/Roughness/Breathiness/Asthenia/Strain), as well as annotations with a thorough examination of phonation quality, intensity, speed, resonance, intelligibility, and prosody. The corpus offers a substantial resource for the exploration of the impact of PD on speech. This data set has already supported several studies, achieving a benchmark accuracy of 89% for the screening of PD. Despite these advances, the broader challenge of conducting a language-agnostic, cross-corpora analysis of Parkinsonian speech patterns remains open.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1367"},"PeriodicalIF":6.9000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655668/pdf/","citationCount":"0","resultStr":"{\"title\":\"NeuroVoz: a Castillian Spanish corpus of parkinsonian speech.\",\"authors\":\"Janaína Mendes-Laureano, Jorge A Gómez-García, Alejandro Guerrero-López, Elisa Luque-Buzo, Julián D Arias-Londoño, Francisco J Grandas-Pérez, Juan I Godino-Llorente\",\"doi\":\"10.1038/s41597-024-04186-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The screening of Parkinson's Disease (PD) through speech is hindered by a notable lack of publicly available datasets in different languages. This fact limits the reproducibility and further exploration of existing research. To address this gap, this manuscript presents the NeuroVoz corpus consisting of 112 native Castilian-Spanish speakers, including 58 healthy controls and 54 individuals with PD, all recorded in ON state. The corpus showcases a diverse array of speech tasks: sustained vowels; diadochokinetic tests; 16 Listen-and-Repeat utterances; and spontaneous monologues. The dataset is also complemented with subjective assessments of voice quality performed by an expert according to the GRBAS scale (Grade/Roughness/Breathiness/Asthenia/Strain), as well as annotations with a thorough examination of phonation quality, intensity, speed, resonance, intelligibility, and prosody. The corpus offers a substantial resource for the exploration of the impact of PD on speech. This data set has already supported several studies, achieving a benchmark accuracy of 89% for the screening of PD. Despite these advances, the broader challenge of conducting a language-agnostic, cross-corpora analysis of Parkinsonian speech patterns remains open.\",\"PeriodicalId\":21597,\"journal\":{\"name\":\"Scientific Data\",\"volume\":\"11 1\",\"pages\":\"1367\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2024-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655668/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Data\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41597-024-04186-z\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-04186-z","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

由于缺乏不同语言的公开数据集，通过言语筛查帕金森病（PD）受到阻碍。这一事实限制了现有研究的可重复性和进一步探索。为了解决这一差距，本文提出了NeuroVoz语料，其中包括112名母语为卡斯蒂利亚-西班牙语的人，包括58名健康对照和54名PD患者，均记录在ON州。语料库展示了各种各样的语音任务：持续元音；diadochokinetic测试;16听并重复话语；即兴独白。该数据集还补充了专家根据GRBAS量表（等级/粗糙度/呼吸/虚弱/紧张）对语音质量进行的主观评估，以及对发声质量、强度、速度、共振、可理解性和韵律进行彻底检查的注释。语料库为探索PD对言语的影响提供了大量资源。该数据集已经支持了几项研究，在PD筛查方面达到了89%的基准准确率。尽管取得了这些进展，但对帕金森言语模式进行语言不可知性、跨语料库分析的更大挑战仍然存在。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech.

查看原文本刊更多论文

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech.

The screening of Parkinson's Disease (PD) through speech is hindered by a notable lack of publicly available datasets in different languages. This fact limits the reproducibility and further exploration of existing research. To address this gap, this manuscript presents the NeuroVoz corpus consisting of 112 native Castilian-Spanish speakers, including 58 healthy controls and 54 individuals with PD, all recorded in ON state. The corpus showcases a diverse array of speech tasks: sustained vowels; diadochokinetic tests; 16 Listen-and-Repeat utterances; and spontaneous monologues. The dataset is also complemented with subjective assessments of voice quality performed by an expert according to the GRBAS scale (Grade/Roughness/Breathiness/Asthenia/Strain), as well as annotations with a thorough examination of phonation quality, intensity, speed, resonance, intelligibility, and prosody. The corpus offers a substantial resource for the exploration of the impact of PD on speech. This data set has already supported several studies, achieving a benchmark accuracy of 89% for the screening of PD. Despite these advances, the broader challenge of conducting a language-agnostic, cross-corpora analysis of Parkinsonian speech patterns remains open.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Scientific Data Social Sciences-Education

CiteScore

11.20

自引率

4.10%

发文量

689

审稿时长

16 weeks

期刊介绍： Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.