追踪阿尔茨海默病患者的语言变化需要多少语音数据?随机长度、5分钟和1分钟自发语音样本的比较。

Q1 Computer Science

Digital Biomarkers Pub Date : 2023-11-24 eCollection Date: 2023-01-01 DOI:10.1159/000533423

Ulla Petti, Simon Baker, Anna Korhonen, Jessica Robin

{"title":"追踪阿尔茨海默病患者的语言变化需要多少语音数据?随机长度、5分钟和1分钟自发语音样本的比较。","authors":"Ulla Petti, Simon Baker, Anna Korhonen, Jessica Robin","doi":"10.1159/000533423","DOIUrl":null,"url":null,"abstract":"Introduction: Changes in speech can act as biomarkers of cognitive decline in Alzheimer's disease (AD). While shorter speech samples would promote data collection and analysis, the minimum length of informative speech samples remains debated. This study aims to provide insight into the effect of sample length in analyzing longitudinal recordings of spontaneous speech in AD by comparing the original random length, 5- and 1-minute-long samples. We hope to understand whether capping the audio improves the accuracy of the analysis, and whether an extra 4 min conveys necessary information.Methods: 110 spontaneous speech samples were collected from decades of Youtube videos of 17 public figures, 9 of whom eventually developed AD. 456 language features were extracted and their text-length-sensitivity, comparability, and ability to capture change over time were analyzed across three different sample lengths.Results: Capped audio files had advantages over the random length ones. While most extracted features were statistically comparable or highly correlated across the datasets, potential effects of sample length should be acknowledged for some features. The 5-min dataset presented the highest reliability in tracking the evolution of the disease, suggesting that the 4 extra minutes do convey informative data.Conclusion: Sample length seems to play an important role in extracting the language feature values from speech and tracking disease progress over time. We highlight the importance of further research into optimal sample length and standardization of methods when studying speech in AD.","PeriodicalId":11242,"journal":{"name":"Digital Biomarkers","volume":"7 1","pages":"157-166"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10673351/pdf/","citationCount":"0","resultStr":"{\"title\":\"How Much Speech Data Is Needed for Tracking Language Change in Alzheimer's Disease? A Comparison of Random Length, 5-Min, and 1-Min Spontaneous Speech Samples.\",\"authors\":\"Ulla Petti, Simon Baker, Anna Korhonen, Jessica Robin\",\"doi\":\"10.1159/000533423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Changes in speech can act as biomarkers of cognitive decline in Alzheimer's disease (AD). While shorter speech samples would promote data collection and analysis, the minimum length of informative speech samples remains debated. This study aims to provide insight into the effect of sample length in analyzing longitudinal recordings of spontaneous speech in AD by comparing the original random length, 5- and 1-minute-long samples. We hope to understand whether capping the audio improves the accuracy of the analysis, and whether an extra 4 min conveys necessary information.Methods: 110 spontaneous speech samples were collected from decades of Youtube videos of 17 public figures, 9 of whom eventually developed AD. 456 language features were extracted and their text-length-sensitivity, comparability, and ability to capture change over time were analyzed across three different sample lengths.Results: Capped audio files had advantages over the random length ones. While most extracted features were statistically comparable or highly correlated across the datasets, potential effects of sample length should be acknowledged for some features. The 5-min dataset presented the highest reliability in tracking the evolution of the disease, suggesting that the 4 extra minutes do convey informative data.Conclusion: Sample length seems to play an important role in extracting the language feature values from speech and tracking disease progress over time. We highlight the importance of further research into optimal sample length and standardization of methods when studying speech in AD.\",\"PeriodicalId\":11242,\"journal\":{\"name\":\"Digital Biomarkers\",\"volume\":\"7 1\",\"pages\":\"157-166\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10673351/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Biomarkers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1159/000533423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Biomarkers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1159/000533423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

语言变化可以作为阿尔茨海默病(AD)认知能力下降的生物标志物。虽然较短的语音样本可以促进数据收集和分析，但信息语音样本的最小长度仍然存在争议。本研究旨在通过比较原始随机长度、5分钟和1分钟的样本，深入了解样本长度对AD自发性语音纵向记录分析的影响。我们希望了解限制音频是否能够提高分析的准确性，以及额外的4分钟是否能够传达必要的信息。方法:从17位公众人物数十年的Youtube视频中收集110个自发语音样本，其中9位最终发展为AD。提取了456种语言特征，并在三种不同的样本长度上分析了它们的文本长度敏感性、可比性和捕获随时间变化的能力。结果:上限音频文件优于随机长度音频文件。虽然大多数提取的特征在数据集之间具有统计可比性或高度相关性，但对于某些特征，应该承认样本长度的潜在影响。5分钟的数据集在追踪疾病演变方面表现出最高的可靠性，这表明额外的4分钟确实传达了信息丰富的数据。结论:样本长度在提取语言特征值和追踪疾病进展中起着重要作用。我们强调了在研究AD语音时进一步研究最佳样本长度和标准化方法的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

How Much Speech Data Is Needed for Tracking Language Change in Alzheimer's Disease? A Comparison of Random Length, 5-Min, and 1-Min Spontaneous Speech Samples.

查看原文本刊更多论文

How Much Speech Data Is Needed for Tracking Language Change in Alzheimer's Disease? A Comparison of Random Length, 5-Min, and 1-Min Spontaneous Speech Samples.

Introduction: Changes in speech can act as biomarkers of cognitive decline in Alzheimer's disease (AD). While shorter speech samples would promote data collection and analysis, the minimum length of informative speech samples remains debated. This study aims to provide insight into the effect of sample length in analyzing longitudinal recordings of spontaneous speech in AD by comparing the original random length, 5- and 1-minute-long samples. We hope to understand whether capping the audio improves the accuracy of the analysis, and whether an extra 4 min conveys necessary information.

Methods: 110 spontaneous speech samples were collected from decades of Youtube videos of 17 public figures, 9 of whom eventually developed AD. 456 language features were extracted and their text-length-sensitivity, comparability, and ability to capture change over time were analyzed across three different sample lengths.

Results: Capped audio files had advantages over the random length ones. While most extracted features were statistically comparable or highly correlated across the datasets, potential effects of sample length should be acknowledged for some features. The 5-min dataset presented the highest reliability in tracking the evolution of the disease, suggesting that the 4 extra minutes do convey informative data.

Conclusion: Sample length seems to play an important role in extracting the language feature values from speech and tracking disease progress over time. We highlight the importance of further research into optimal sample length and standardization of methods when studying speech in AD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Biomarkers Medicine-Medicine (miscellaneous)

CiteScore

10.60

自引率

0.00%

发文量

审稿时长

23 weeks