口吃解决器:端到端的多语言不流利检测。

Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli
{"title":"口吃解决器:端到端的多语言不流利检测。","authors":"Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli","doi":"10.1109/slt61566.2024.10832222","DOIUrl":null,"url":null,"abstract":"<p><p>Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose <i>Stutter-Solver</i>: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. <i>Stutter-Solver</i> can handle <i>co-dysfluencies</i> and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: <i>VCTK-Pro</i>, <i>VCTK-Art</i>, and <i>AISHELL3-Pro</i>, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves <i>state-of-the-art</i> performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2024 ","pages":"1039-1046"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12233913/pdf/","citationCount":"0","resultStr":"{\"title\":\"STUTTER-SOLVER: END-TO-END MULTI-LINGUAL DYSFLUENCY DETECTION.\",\"authors\":\"Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli\",\"doi\":\"10.1109/slt61566.2024.10832222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose <i>Stutter-Solver</i>: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. <i>Stutter-Solver</i> can handle <i>co-dysfluencies</i> and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: <i>VCTK-Pro</i>, <i>VCTK-Art</i>, and <i>AISHELL3-Pro</i>, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves <i>state-of-the-art</i> performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.</p>\",\"PeriodicalId\":74811,\"journal\":{\"name\":\"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology\",\"volume\":\"2024 \",\"pages\":\"1039-1046\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12233913/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/slt61566.2024.10832222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/slt61566.2024.10832222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目前事实上的不流畅建模方法[1,2]使用模板匹配算法,这些算法不能推广到跨语言的域外真实世界的不流畅,并且不能随着训练数据量的增加而扩展。为了解决这些问题,我们提出了口吃解决器:一个端到端的框架,通过准确的类型和时间转录来检测不流利,灵感来自YOLO[3]对象检测算法。stuttter - solver可以处理共同流利障碍,是一个天然的多语言流利障碍检测器。为了利用可扩展性和提高性能,我们还引入了三种新的非流利语料:VCTK-Pro, VCTK-Art和AISHELL3-Pro,通过发音编码和基于tts的方法模拟自然的口语不流利,包括重复,块,缺失,替换和延长。我们的方法在所有可用的非流利语料库上达到了最先进的性能。代码和数据集在https://github.com/eureka235/Stutter-Solver上开源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
STUTTER-SOLVER: END-TO-END MULTI-LINGUAL DYSFLUENCY DETECTION.

Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信