SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology最新文献

SPEECH RECOGNITION FOR ANALYSIS OF POLICE RADIO COMMUNICATION. 用于警察无线电通信分析的语音识别。

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2024-12-01 DOI: 10.1109/slt61566.2024.10832157

Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul

{"title":"SPEECH RECOGNITION FOR ANALYSIS OF POLICE RADIO COMMUNICATION.","authors":"Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul","doi":"10.1109/slt61566.2024.10832157","DOIUrl":"10.1109/slt61566.2024.10832157","url":null,"abstract":"Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46 hours of audio) to evaluate the feasibility of automatic speech recognition (ASR) using modern recognition models. We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. Large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance. Our work suggests directions for future work, including analysis of short utterances and potential miscommunication in police radio interactions. We make our corpus and data annotation pipeline available to other researchers, to enable further research on recognition and analysis of police communication.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2024 ","pages":"906-912"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STUTTER-SOLVER: END-TO-END MULTI-LINGUAL DYSFLUENCY DETECTION. 口吃解决器：端到端的多语言不流利检测。

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2024-12-01 DOI: 10.1109/slt61566.2024.10832222

Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli

{"title":"STUTTER-SOLVER: END-TO-END MULTI-LINGUAL DYSFLUENCY DETECTION.","authors":"Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno-Tempini, Jiachen Lian, Gopala Anumanchipalli","doi":"10.1109/slt61566.2024.10832222","DOIUrl":"10.1109/slt61566.2024.10832222","url":null,"abstract":"Current de-facto dysfluency modeling methods [1, 2] utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO [3] object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2024 ","pages":"1039-1046"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12233913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144585834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS. stylettes - vc:通过基于风格的TTS模型的知识转移进行一次语音转换。

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2023-01-01 DOI: 10.1109/slt54892.2023.10022498

Yinghao Aaron Li, Cong Han, Nima Mesgarani

{"title":"STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS.","authors":"Yinghao Aaron Li, Cong Han, Nima Mesgarani","doi":"10.1109/slt54892.2023.10022498","DOIUrl":"https://doi.org/10.1109/slt54892.2023.10022498","url":null,"abstract":"One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models. With cycle consistent and adversarial training, the style-based TTS models can perform transcription-guided one-shot VC with high fidelity and similarity. By learning an additional mel-spectrogram encoder through a teacher-student knowledge transfer and novel data augmentation scheme, our approach results in disentangled speech representation without needing the input text. The subjective evaluation shows that our approach can significantly outperform the previous state-of-the-art one-shot voice conversion models in both naturalness and similarity.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2022 ","pages":"920-927"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10417535/pdf/nihms-1919646.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9990482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM. 自闭症儿童语言发展轨迹的计算分析。

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2014-12-01 Epub Date: 2015-04-02 DOI: 10.1109/SLT.2014.7078585

Emily Prud'hommeaux, Eric Morley, Masoud Rouhizadeh, Laura Silverman, Jan van Santen, Brian Roark, Richard Sproat, Sarah Kauper, Rachel DeLaHunta

{"title":"COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM.","authors":"Emily Prud'hommeaux, Eric Morley, Masoud Rouhizadeh, Laura Silverman, Jan van Santen, Brian Roark, Richard Sproat, Sarah Kauper, Rachel DeLaHunta","doi":"10.1109/SLT.2014.7078585","DOIUrl":"https://doi.org/10.1109/SLT.2014.7078585","url":null,"abstract":"Deficits in semantic and pragmatic expression are among the hallmark linguistic features of autism. Recent work in deriving computational correlates of clinical spoken language measures has demonstrated the utility of automated linguistic analysis for characterizing the language of children with autism. Most of this research, however, has focused either on young children still acquiring language or on small populations covering a wide age range. In this paper, we extract numerous linguistic features from narratives produced by two groups of children with and without autism from two narrow age ranges. We find that although many differences between diagnostic groups remain constant with age, certain pragmatic measures, particularly the ability to remain on topic and avoid digressions, seem to improve. These results confirm findings reported in the psychology literature while underscoring the need for careful consideration of the age range of the population under investigation when performing clinically oriented computational analysis of spoken language.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2014 ","pages":"266-271"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/SLT.2014.7078585","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35532885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

ROBUST DETECTION OF VOICED SEGMENTS IN SAMPLES OF EVERYDAY CONVERSATIONS USING UNSUPERVISED HMMS. 使用无监督 HMMs 对日常对话样本中的语音片段进行稳健检测。

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2012-12-01 Epub Date: 2013-02-01 DOI: 10.1109/slt.2012.6424264

Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk

{"title":"ROBUST DETECTION OF VOICED SEGMENTS IN SAMPLES OF EVERYDAY CONVERSATIONS USING UNSUPERVISED HMMS.","authors":"Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk","doi":"10.1109/slt.2012.6424264","DOIUrl":"10.1109/slt.2012.6424264","url":null,"abstract":"We investigate methods for detecting voiced segments in everyday conversations from ambient recordings. Such recordings contain high diversity of background noise, making it difficult or infeasible to collect representative labelled samples for estimating noise-specific HMM models. The popular utility get-f0 and its derivatives compute normalized cross-correlation for detecting voiced segments, which unfortunately is sensitive to different types of noise. Exploiting the fact that voiced speech is not just periodic but also rich in harmonic, we model voiced segments by adopting harmonic models, which have recently gained considerable attention. In previous work, the parameters of the model were estimated independently for each frame using maximum likelihood criterion. However, since the distribution of harmonic coefficients depend on articulators of speakers, we estimate the model parameters more robustly using a maximum a posteriori criterion. We use the likelihood of voicing, computed from the harmonic model, as an observation probability of an HMM and detect speech using this unsupervised HMM. The one caveat of the harmonic model is that they fail to distinguish speech from other stationary harmonic noise. We rectify this weakness by taking advantage of the non-stationary property of speech. We evaluate our models empirically on a task of detecting speech on a large corpora of everyday speech and demonstrate that these models perform significantly better than standard voice detection algorithm employed in popular tools.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2012 ","pages":"438-442"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7909075/pdf/nihms-1670854.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25414977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation 基于分数范围估计的有效的先验和增量波束宽度控制来抑制过多的语音识别时间

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2012-01-01 DOI: 10.1109/SLT.2012.6424209

Satoshi Kobashikawa, Takaaki Hori, Y. Yamaguchi, Taichi Asami, H. Masataki, Satoshi Takahashi

引用次数: 0

Speech Technology Opportunities and Challenges 语音技术的机遇与挑战

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2006-12-01 DOI: 10.1109/SLT.2006.326778

D. Nahamoo

{"title":"Speech Technology Opportunities and Challenges","authors":"D. Nahamoo","doi":"10.1109/SLT.2006.326778","DOIUrl":"https://doi.org/10.1109/SLT.2006.326778","url":null,"abstract":"Summary form only given. Two forces are in pursuit of discovering the possibilities of speech technology automation. First is the global research and development community which has been hard at work for improving the performance and usability of the technology. Second is the business community which constantly evaluates the performance of the technology against the expectation of the user community for delivering solutions such as a spoken car navigation system. While the performance improvement has been on a constant positive progress curve, the market opportunity has been on a much more uncertain curve. For example, the early vision of delivering a dictation solution has been on hold in recent years while it enjoyed enormous interest in the 90 s. At the same time, some industry experts predict that this vision will be fulfilled soon because of the usability needs of billions of mobile devices in use today. Analogies can be drawn for the use of speech technologies for call centers self service interaction. While we have seen a much bigger market success, some industry experts predict that web self services will slow down the use of speech self service. So, where does the truth lie? What are those market opportunities that are clear winners? What opportunities will open up in future and what are their technical challenges? In this talk, we will address some of these questions.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"34 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82726673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

No More Strings, please 请不要再用绳子了

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2006-12-01 DOI: 10.1109/SLT.2006.326779

Kevin Knight

引用次数: 0

Information Extraction from speech 语音信息提取

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2006-12-01 DOI: 10.1109/SLT.2006.326780

J. Makhoul

引用次数: 34

Widening the NLP Pipeline for spoken Language Processing 扩大口语语言处理的NLP管道

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology Pub Date : 2006-12-01 DOI: 10.1109/SLT.2006.326787

S. Bangalore

{"title":"Widening the NLP Pipeline for spoken Language Processing","authors":"S. Bangalore","doi":"10.1109/SLT.2006.326787","DOIUrl":"https://doi.org/10.1109/SLT.2006.326787","url":null,"abstract":"Summary form only given. A typical text-based natural language application (eg. machine translation, summarization, information extraction) consists of a pipeline of preprocessing steps such as tokenization, stemming, part-of-speech tagging, named entity detection, chunking, parsing. Information flows downstream through the preprocessing steps along a narrow pipe: each step transforms a single input string into a single best solution string. However, this narrow pipe is limiting for two reasons: First, since each of the preprocessing steps are erroneous, producing a single best solution could magnify the error propogation down the pipeline. Second, the preprocessing steps are forced to resolve genuine ambiguity prematurely. While the widening of the pipeline can potentially benefit text-based language applications, it is imperative for spoken language processing where the output from the speech recognizer is typically a word lattice/graph. In this talk, we review how such a goal has been accomplished in tasks such as spoken language understanding, speech translation and multimodal language processing. We will also sketch methods that encode the preprocessing steps as finite-state transductions in order to exploit composition of finite-state transducers as a general constraint propogation method.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"48 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85810428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0