Self-supervised Speech Models for Word-Level Stuttered Speech Detection

arXiv - EE - Audio and Speech Processing Pub Date : 2024-09-16 DOI:arxiv-2409.10704

Yi-Jen Shih, Zoi Gkalitsiou, Alexandros G. Dimakis, David Harwath

{"title":"Self-supervised Speech Models for Word-Level Stuttered Speech Detection","authors":"Yi-Jen Shih, Zoi Gkalitsiou, Alexandros G. Dimakis, David Harwath","doi":"arxiv-2409.10704","DOIUrl":null,"url":null,"abstract":"Clinical diagnosis of stuttering requires an assessment by a licensed\nspeech-language pathologist. However, this process is time-consuming and\nrequires clinicians with training and experience in stuttering and fluency\ndisorders. Unfortunately, only a small percentage of speech-language\npathologists report being comfortable working with individuals who stutter,\nwhich is inadequate to accommodate for the 80 million individuals who stutter\nworldwide. Developing machine learning models for detecting stuttered speech\nwould enable universal and automated screening for stuttering, enabling speech\npathologists to identify and follow up with patients who are most likely to be\ndiagnosed with a stuttering speech disorder. Previous research in this area has\npredominantly focused on utterance-level detection, which is not sufficient for\nclinical settings where word-level annotation of stuttering is the norm. In\nthis study, we curated a stuttered speech dataset with word-level annotations\nand introduced a word-level stuttering speech detection model leveraging\nself-supervised speech models. Our evaluation demonstrates that our model\nsurpasses previous approaches in word-level stuttering speech detection.\nAdditionally, we conducted an extensive ablation analysis of our method,\nproviding insight into the most important aspects of adapting self-supervised\nspeech models for stuttered speech detection.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Clinical diagnosis of stuttering requires an assessment by a licensed speech-language pathologist. However, this process is time-consuming and requires clinicians with training and experience in stuttering and fluency disorders. Unfortunately, only a small percentage of speech-language pathologists report being comfortable working with individuals who stutter, which is inadequate to accommodate for the 80 million individuals who stutter worldwide. Developing machine learning models for detecting stuttered speech would enable universal and automated screening for stuttering, enabling speech pathologists to identify and follow up with patients who are most likely to be diagnosed with a stuttering speech disorder. Previous research in this area has predominantly focused on utterance-level detection, which is not sufficient for clinical settings where word-level annotation of stuttering is the norm. In this study, we curated a stuttered speech dataset with word-level annotations and introduced a word-level stuttering speech detection model leveraging self-supervised speech models. Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection. Additionally, we conducted an extensive ablation analysis of our method, providing insight into the most important aspects of adapting self-supervised speech models for stuttered speech detection.

查看原文本刊更多论文

用于词级口吃语音检测的自监督语音模型

口吃的临床诊断需要有执照的语言病理学家进行评估。然而，这一过程非常耗时，而且需要临床医生接受过口吃和流利性障碍方面的培训并具有相关经验。遗憾的是，只有一小部分言语病理学家表示自己能够胜任口吃患者的工作，这不足以满足全球 8000 万口吃患者的需求。开发用于检测口吃语音的机器学习模型可以实现口吃的普遍和自动筛查，使言语病理学家能够识别和跟踪最有可能被诊断为口吃性言语障碍的患者。以前在这一领域的研究主要集中在语篇级检测上，这对于临床环境来说是不够的，因为在临床环境中，口吃的词级注释是常态。在这项研究中，我们整理了一个带有词级注释的口吃语音数据集，并利用自我监督语音模型引入了词级口吃语音检测模型。我们的评估结果表明，我们的模型在词级口吃语音检测方面超越了以往的方法。此外，我们还对我们的方法进行了广泛的消融分析，为口吃语音检测中调整自监督语音模型的最重要方面提供了见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - EE - Audio and Speech Processing

自引率

0.00%

发文量