2022 IEEE Spoken Language Technology Workshop (SLT)最新文献

筛选
英文 中文
keynotes 主题演讲
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/slt54892.2023.10023372
{"title":"keynotes","authors":"","doi":"10.1109/slt54892.2023.10023372","DOIUrl":"https://doi.org/10.1109/slt54892.2023.10023372","url":null,"abstract":"","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135062004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Organizing Committee 组织委员会
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/slt54892.2023.10023005
{"title":"Organizing Committee","authors":"","doi":"10.1109/slt54892.2023.10023005","DOIUrl":"https://doi.org/10.1109/slt54892.2023.10023005","url":null,"abstract":"","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135062006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Noise Robustness for Spoken Content Retrieval Using Semi-Supervised ASR and N-Best Transcripts for BERT-Based Ranking Models 基于bert排序模型的半监督ASR和N-Best转录本提高语音内容检索的噪声鲁棒性
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10023197
Yasufumi Moriya, Gareth J. F. Jones
{"title":"Improving Noise Robustness for Spoken Content Retrieval Using Semi-Supervised ASR and N-Best Transcripts for BERT-Based Ranking Models","authors":"Yasufumi Moriya, Gareth J. F. Jones","doi":"10.1109/SLT54892.2023.10023197","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10023197","url":null,"abstract":"BERT-based re-ranking and dense retrieval (DR) systems have been shown to improve search effectiveness for spoken content retrieval (SCR). However, both methods can still show a reduction in effectiveness when using ASR transcripts in comparison to accurate manual transcripts. We find that a known-item search task on the How2 dataset of spoken instruction videos shows a reduction in mean reciprocal rank (MRR) scores of 10-14%. As a potential method to reduce this disparity, we investigate the use of semi-supervised ASR transcripts and N-best ASR transcripts to mitigate ASR errors for spoken search using BERT-based ranking. Semi-supervised ASR transcripts brought 2-5.5% MRR improvements over standard ASR transcripts and our N-best early fusion methods for BERT DR systems improved MRR by 3-4%. Combining semi-supervised transcripts with N-best early fusion for BERT DR reduced the MRR gap in search effectiveness between manual and ASR transcripts by more than 50% from 14.32% to 6.58%.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133401001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine Grained Spoken Document Summarization Through Text Segmentation 通过文本分割的细粒度口语文档摘要
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10022829
Samantha Kotey, Rozenn Dahyot, N. Harte
{"title":"Fine Grained Spoken Document Summarization Through Text Segmentation","authors":"Samantha Kotey, Rozenn Dahyot, N. Harte","doi":"10.1109/SLT54892.2023.10022829","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10022829","url":null,"abstract":"Podcast transcripts are long spoken documents of conversational dialogue. Challenging to summarize, podcasts cover a diverse range of topics, vary in length, and have uniquely different linguistic styles. Previous studies in podcast summarization have generated short, concise dialogue summaries. In contrast, we propose a method to generate long fine-grained summaries, which describe details of sub-topic narratives. Leveraging a readability formula, we curate a data subset to train a long sequence transformer for abstractive summarization. Through text segmentation, we filter the evaluation data and exclude specific segments of text. We apply the model to segmented data, producing different types of fine grained summaries. We show that appropriate filtering creates comparable results on ROUGE and serves as an alternative method to truncation. Experiments show our model outperforms previous studies on the Spotify podcast dataset when tasked with generating longer sequences of text.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122505702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System 一个真正的多语言第一通道和单语言第二通道流式设备ASR系统
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10023346
S. Mavandadi, Bo Li, Chaoyang Zhang, B. Farris, Tara N. Sainath, Trevor Strohman
{"title":"A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System","authors":"S. Mavandadi, Bo Li, Chaoyang Zhang, B. Farris, Tara N. Sainath, Trevor Strohman","doi":"10.1109/SLT54892.2023.10023346","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10023346","url":null,"abstract":"Automatic speech recognition (ASR) systems need to be accurate, have low latency, and effectively handle language switching in order to be useful for the 60% of the world population that speaks more than one language. Thus, we propose a truly multilingual first-pass and monolingual second-pass streaming on-device ASR system based on the recently developed Cascaded Encoders model. The streaming first-pass recognizes multilingual speech without needing language information, providing real-time transcription, even for code-switching speech. The second-pass uses a language dependent right context encoder to improve the recognition accuracy. On a 9 language Voice Search task, we find that a system combining shared causal encoder with decoders and non-causal encoders replicated per-language reduces word error rate (WER) by 4.4% relative to monolingual baselines. We further show this design to be parameter efficient, outperforming other architectures when matched in the number of parameters.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117286677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models 利用多语言声学和语言模型探索多种南印度语言的统一ASR
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10022380
C. Anoop, A. Ramakrishnan
{"title":"Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models","authors":"C. Anoop, A. Ramakrishnan","doi":"10.1109/SLT54892.2023.10022380","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10022380","url":null,"abstract":"We build a single automatic speech recognition (ASR) model for several south Indian languages using a common set of intermediary labels, which can be easily mapped to the desired native script through simple lookup tables and a few rules. We use Sanskrit Library Phonetic encoding as the labeling scheme, which exploits the similarity in pronunciation across character sets of multiple Indian languages. Unlike the general approaches, which leverage common label sets only for multilingual acoustic modeling, we also explore multilingual language modeling. Our unified model improves the ASR performance in languages with limited amounts of speech data and also in out-of-domain test conditions. Also, the model performs reasonably well in languages with good representation in the training data.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131754967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Untied Positional Encodings for Efficient Transformer-Based Speech Recognition 基于变换的高效语音识别的联合位置编码
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10023097
Lahiru Samarakoon, Ivan Fung
{"title":"Untied Positional Encodings for Efficient Transformer-Based Speech Recognition","authors":"Lahiru Samarakoon, Ivan Fung","doi":"10.1109/SLT54892.2023.10023097","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10023097","url":null,"abstract":"Self-attention has become a vital component for end-to-end (E2E) automatic speech recognition (ASR). Convolution-augmented Transformer (Conformer) with relative positional encoding (RPE) achieved state-of-the-art performance. This paper proposes a positional encoding (PE) mechanism called Scaled Untied RPE that unties the feature-position correlations in the self-attention computation, and computes feature correlations and positional correlations separately using different projection matrices. In addition, we propose to scale feature correlations with the positional correlations and the aggressiveness of this multiplicative interaction can be configured using a parameter called amplitude. Moreover, we show that the PE matrix can be sliced to reduce model parameters. Our results on National Speech Corpus (NSC) show that Transformer encoders with Scaled Untied RPE achieves relative improvements of 1.9% in accuracy and up to 50.9% in latency over a Conformer baseline respectively.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116941493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Personalization of CTC Speech Recognition Models CTC语音识别模型的个性化
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10022705
Saket Dingliwal, Monica Sunkara, S. Ronanki, Jeffrey J. Farris, K. Kirchhoff, S. Bodapati
{"title":"Personalization of CTC Speech Recognition Models","authors":"Saket Dingliwal, Monica Sunkara, S. Ronanki, Jeffrey J. Farris, K. Kirchhoff, S. Bodapati","doi":"10.1109/SLT54892.2023.10022705","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10022705","url":null,"abstract":"End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previous time steps to influence future predictions. To tackle this, we propose a novel two-way approach that first biases the encoder with attention over a predefined list of rare long-tail and out-of-vocabulary (OOV) words and then uses dynamic boosting and phone alignment network during decoding to further bias the subword pre-dictions. We evaluate our approach on open-source VoxPopuli and in-house medical datasets to showcase a 60% improvement in F1 score on domain-specific rare words over a strong CTC baseline.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122413886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features 基于随机编码器特征的端到端自动语音识别内部语言模型个性化
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10022938
Adam Stooke, K. Sim, Mason Chua, Tsendsuren Munkhdalai, Trevor Strohman
{"title":"Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features","authors":"Adam Stooke, K. Sim, Mason Chua, Tsendsuren Munkhdalai, Trevor Strohman","doi":"10.1109/SLT54892.2023.10022938","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10022938","url":null,"abstract":"End-to-end (E2E) speech-to-text models generally require transcribed audio for training and personalization. We introduce the use of random audio encoder features, rather than speech, to fine-tune the final model layers and acquire new vocabulary from text-only data. This technique can be used for on-device personalization before the user has provided any speech data. We show improvements in the recall of new vocabulary and word error rate (WER) on held-out test sets using simulated user experiments on hybrid autoregressive transducer (HAT) models using conformer-based encoders and simple text embeddings for label processing. We compare this approach to the use of synthetic audio, finding random encoder features to be more beneficial with lower computational cost. Experiments show that the maximum benefit is gained by updating specific network components comprising a subset of those expressing the internal language model.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127485123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Transformer-Based Lip-Reading with Regularized Dropout and Relaxed Attention 基于变压器的正则化Dropout和放松注意唇读
2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2023-01-09 DOI: 10.1109/SLT54892.2023.10023442
Zhengyang Li, Timo Lohrenz, Matthias Dunkelberg, T. Fingscheidt
{"title":"Transformer-Based Lip-Reading with Regularized Dropout and Relaxed Attention","authors":"Zhengyang Li, Timo Lohrenz, Matthias Dunkelberg, T. Fingscheidt","doi":"10.1109/SLT54892.2023.10023442","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10023442","url":null,"abstract":"End-to-end automatic lip-reading usually comprises an encoder-decoder model and an optional external language model. In this work, we introduce two regularization methods to the field of lip-reading: First, we apply the regularized dropout (R-Drop) method to transformer-based lip-reading to improve their training-inference consistency. Second, the relaxed attention technique is applied during training for a better external language model integration. We are the first to show that these two complementary approaches yield particu1arly strong performance if combined in the right manner. In particular, by adding an additional R - Drop loss and smoothing the attention weights in cross multi-head attention during training only, we achieve a new state of the art with a word error rate of 22.2% on Lip Reading Sentences 2 (LRS2). On LRS3, we are 2nd ranked with 25.5% WER using only 1,759 h of training data, while the 1 st rank uses about 90,000 h. Our code is available at GitHub.11https://github.com/ifnspaml/Lipreading-RDrop-RA","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128091595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信