Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli
{"title":"口吃解决器:端到端的多语言不流利检测。","authors":"Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli","doi":"10.1109/slt61566.2024.10832222","DOIUrl":null,"url":null,"abstract":"<p><p>Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose <i>Stutter-Solver</i>: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. <i>Stutter-Solver</i> can handle <i>co-dysfluencies</i> and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: <i>VCTK-Pro</i>, <i>VCTK-Art</i>, and <i>AISHELL3-Pro</i>, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves <i>state-of-the-art</i> performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2024 ","pages":"1039-1046"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12233913/pdf/","citationCount":"0","resultStr":"{\"title\":\"STUTTER-SOLVER: END-TO-END MULTI-LINGUAL DYSFLUENCY DETECTION.\",\"authors\":\"Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli\",\"doi\":\"10.1109/slt61566.2024.10832222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose <i>Stutter-Solver</i>: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. <i>Stutter-Solver</i> can handle <i>co-dysfluencies</i> and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: <i>VCTK-Pro</i>, <i>VCTK-Art</i>, and <i>AISHELL3-Pro</i>, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves <i>state-of-the-art</i> performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.</p>\",\"PeriodicalId\":74811,\"journal\":{\"name\":\"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology\",\"volume\":\"2024 \",\"pages\":\"1039-1046\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12233913/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/slt61566.2024.10832222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/slt61566.2024.10832222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.