{"title":"老龄化研究中的口语分析:使用OpenAI的Whisper对人工智能生成的语音文本的有效性。","authors":"Ava Naffah, Valeria A Pfeifer, Matthias R Mehl","doi":"10.1159/000545244","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Studying what older adults say can provide important insights into cognitive, affective, and social aspects of aging. Available language analysis tools generally require audio-recorded speech to be transcribed into verbatim text, a task that has historically been performed by humans. However, recent advances in AI-based language processing open up the possibility of replacing this time- and resource-intensive task with fully automatic speech to text.</p><p><strong>Methods: </strong>This study evaluates the accuracy of two common automatic speech-to-text tools - OpenAI's Whisper and otter.ai - relative to human-corrected transcripts. Based on two speech tasks completed by 238 older adults, we used the Linguistic Inquiry and Word Count (LIWC) to compare language features of text generated by each transcription method. The study further assessed the degree to which manual tagging of filler words (e.g., \"like,\" \"well\") common in spoken language impacts the validity of the analysis.</p><p><strong>Results: </strong>The AI-based LIWC features evidenced very high convergence with the LIWC features derived from the human-corrected transcripts (average r = 0.98). Further, the manual tagging of filler words did not impact the validity for all LIWC features except the categories filler words and netspeak.</p><p><strong>Conclusion: </strong>These findings support that Whisper and otter.ai are valuable tools for language analysis in aging research and provide further evidence that automatic speech to text with state-of-the art AI tools is ready for psychological language research.</p>","PeriodicalId":12662,"journal":{"name":"Gerontology","volume":"71 5","pages":"417-424"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12188092/pdf/","citationCount":"0","resultStr":"{\"title\":\"Spoken Language Analysis in Aging Research: The Validity of AI-Generated Speech to Text Using OpenAI's Whisper.\",\"authors\":\"Ava Naffah, Valeria A Pfeifer, Matthias R Mehl\",\"doi\":\"10.1159/000545244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Studying what older adults say can provide important insights into cognitive, affective, and social aspects of aging. Available language analysis tools generally require audio-recorded speech to be transcribed into verbatim text, a task that has historically been performed by humans. However, recent advances in AI-based language processing open up the possibility of replacing this time- and resource-intensive task with fully automatic speech to text.</p><p><strong>Methods: </strong>This study evaluates the accuracy of two common automatic speech-to-text tools - OpenAI's Whisper and otter.ai - relative to human-corrected transcripts. Based on two speech tasks completed by 238 older adults, we used the Linguistic Inquiry and Word Count (LIWC) to compare language features of text generated by each transcription method. The study further assessed the degree to which manual tagging of filler words (e.g., \\\"like,\\\" \\\"well\\\") common in spoken language impacts the validity of the analysis.</p><p><strong>Results: </strong>The AI-based LIWC features evidenced very high convergence with the LIWC features derived from the human-corrected transcripts (average r = 0.98). Further, the manual tagging of filler words did not impact the validity for all LIWC features except the categories filler words and netspeak.</p><p><strong>Conclusion: </strong>These findings support that Whisper and otter.ai are valuable tools for language analysis in aging research and provide further evidence that automatic speech to text with state-of-the art AI tools is ready for psychological language research.</p>\",\"PeriodicalId\":12662,\"journal\":{\"name\":\"Gerontology\",\"volume\":\"71 5\",\"pages\":\"417-424\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12188092/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Gerontology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1159/000545244\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"GERIATRICS & GERONTOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gerontology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000545244","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/13 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}
Spoken Language Analysis in Aging Research: The Validity of AI-Generated Speech to Text Using OpenAI's Whisper.
Introduction: Studying what older adults say can provide important insights into cognitive, affective, and social aspects of aging. Available language analysis tools generally require audio-recorded speech to be transcribed into verbatim text, a task that has historically been performed by humans. However, recent advances in AI-based language processing open up the possibility of replacing this time- and resource-intensive task with fully automatic speech to text.
Methods: This study evaluates the accuracy of two common automatic speech-to-text tools - OpenAI's Whisper and otter.ai - relative to human-corrected transcripts. Based on two speech tasks completed by 238 older adults, we used the Linguistic Inquiry and Word Count (LIWC) to compare language features of text generated by each transcription method. The study further assessed the degree to which manual tagging of filler words (e.g., "like," "well") common in spoken language impacts the validity of the analysis.
Results: The AI-based LIWC features evidenced very high convergence with the LIWC features derived from the human-corrected transcripts (average r = 0.98). Further, the manual tagging of filler words did not impact the validity for all LIWC features except the categories filler words and netspeak.
Conclusion: These findings support that Whisper and otter.ai are valuable tools for language analysis in aging research and provide further evidence that automatic speech to text with state-of-the art AI tools is ready for psychological language research.
期刊介绍:
In view of the ever-increasing fraction of elderly people, understanding the mechanisms of aging and age-related diseases has become a matter of urgent necessity. ''Gerontology'', the oldest journal in the field, responds to this need by drawing topical contributions from multiple disciplines to support the fundamental goals of extending active life and enhancing its quality. The range of papers is classified into four sections. In the Clinical Section, the aetiology, pathogenesis, prevention and treatment of agerelated diseases are discussed from a gerontological rather than a geriatric viewpoint. The Experimental Section contains up-to-date contributions from basic gerontological research. Papers dealing with behavioural development and related topics are placed in the Behavioural Science Section. Basic aspects of regeneration in different experimental biological systems as well as in the context of medical applications are dealt with in a special section that also contains information on technological advances for the elderly. Providing a primary source of high-quality papers covering all aspects of aging in humans and animals, ''Gerontology'' serves as an ideal information tool for all readers interested in the topic of aging from a broad perspective.