STUTTER-SOLVER: END-TO-END MULTI-LINGUAL DYSFLUENCY DETECTION.

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2024-12-01 DOI:10.1109/slt61566.2024.10832222

Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli

{"title":"STUTTER-SOLVER: END-TO-END MULTI-LINGUAL DYSFLUENCY DETECTION.","authors":"Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli","doi":"10.1109/slt61566.2024.10832222","DOIUrl":null,"url":null,"abstract":"Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2024 ","pages":"1039-1046"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12233913/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/slt61566.2024.10832222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.

查看原文本刊更多论文

口吃解决器：端到端的多语言不流利检测。

目前事实上的不流畅建模方法[1,2]使用模板匹配算法，这些算法不能推广到跨语言的域外真实世界的不流畅，并且不能随着训练数据量的增加而扩展。为了解决这些问题，我们提出了口吃解决器：一个端到端的框架，通过准确的类型和时间转录来检测不流利，灵感来自YOLO[3]对象检测算法。stuttter - solver可以处理共同流利障碍，是一个天然的多语言流利障碍检测器。为了利用可扩展性和提高性能，我们还引入了三种新的非流利语料：VCTK-Pro， VCTK-Art和AISHELL3-Pro，通过发音编码和基于tts的方法模拟自然的口语不流利，包括重复，块，缺失，替换和延长。我们的方法在所有可用的非流利语料库上达到了最先进的性能。代码和数据集在https://github.com/eureka235/Stutter-Solver上开源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology

自引率

0.00%

发文量