2022 IEEE Spoken Language Technology Workshop (SLT)最新文献_第10页

Improving Semi-Supervised End-To-End Automatic Speech Recognition Using Cyclegan and Inter-Domain Losses 利用循环gan和域间损失改进半监督端到端自动语音识别

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-20 DOI: 10.1109/SLT54892.2023.10022448

C. Li, Ngoc Thang Vu

引用次数: 1

A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification 带有语言修饰的自适应语音生成的数据驱动研究

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-19 DOI: 10.1109/SLT54892.2023.10022437

Anupama Chingacham, Vera Demberg, D. Klakow

引用次数: 0

End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation 语音识别的端到端集成，去噪，波束成形，和自监督学习表示

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-19 DOI: 10.1109/SLT54892.2023.10023199

Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, Nobutaka Ono

引用次数: 9

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR G-Augment:面向ASR的数据增强策略元结构的搜索

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-19 DOI: 10.1109/SLT54892.2023.10022748

Gary Wang, Ekin D.Cubuk, A. Rosenberg, Shuyang Cheng, Ron J. Weiss, B. Ramabhadran, P. Moreno, Quoc V. Le, Daniel S. Park

引用次数: 0

Two-Stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-Sequence Voice Conversion 基于序列到序列语音转换的日语电喉语音增强两阶段训练方法

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-19 DOI: 10.1109/SLT54892.2023.10023033

D. Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, T. Toda

引用次数: 2

N-Best Hypotheses Reranking for Text-to-SQL Systems 文本到sql系统的n -最佳假设重新排序

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-19 DOI: 10.1109/SLT54892.2023.10023434

Lu Zeng, S. Parthasarathi, Dilek Z. Hakkani-Tür

引用次数: 6

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR 利用联合语音-文本表示学习实现零监督语音ASR

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-18 DOI: 10.1109/SLT54892.2023.10022791

Zhehuai Chen, Ankur Bapna, A. Rosenberg, Yu Zhang, B. Ramabhadran, P. Moreno, Nanxin Chen

{"title":"Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR","authors":"Zhehuai Chen, Ankur Bapna, A. Rosenberg, Yu Zhang, B. Ramabhadran, P. Moreno, Nanxin Chen","doi":"10.1109/SLT54892.2023.10022791","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10022791","url":null,"abstract":"Training state-of-the-art Automated Speech Recognition (ASR) models typically requires a substantial amount of transcribed speech. In this work, we demonstrate that a modality-matched joint speech and text model introduced in [1] can be leveraged to train a massively multilingual ASR model without any supervised (manually transcribed) speech for some languages. This paper explores the use of jointly learnt speech and text representations in a massively multilingual, zero supervised speech, real-world setting to expand the set of languages covered by ASR with only unlabeled speech and text in the target languages. Using the FLEURS dataset, we define the task to cover 102 languages, where transcribed speech is available in 52 of these languages and can be used to improve end-to-end ASR quality on the remaining 50. First, we show that by combining speech representations with byte-level text representations and use of language embeddings, we can dramatically reduce the Character Error Rate (CER) on languages with no supervised speech from 64.8% to 30.8%, a relative reduction of 53%. Second, using a subset of South Asian languages we show that Maestro-U can promote knowledge transfer from languages with supervised speech even when there is limited to no graphemic overlap. Overall, Maestro-U closes the gap to oracle performance by 68.5% relative and reduces the CER of 19 languages below 15%.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130994234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch 自动语音识别的HMM与CTC:基于从头开始的全和训练的比较

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-18 DOI: 10.1109/SLT54892.2023.10022967

Tina Raissi, Wei Zhou, S. Berger, R. Schluter, H. Ney

引用次数: 5

SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning SVLDL:使用选择性方差标签分布学习改进说话人年龄估计

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-18 DOI: 10.1109/SLT54892.2023.10023124

Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao

引用次数: 1

Sub-8-Bit Quantization for On-Device Speech Recognition: A Regularization-Free Approach 设备上语音识别的8位以下量化:一种无正则化的方法

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-17 DOI: 10.1109/SLT54892.2023.10022821

Kai Zhen, Martin H. Radfar, H. Nguyen, Grant P. Strimel, Nathan Susanj, A. Mouchtaris

引用次数: 3