2022 IEEE Spoken Language Technology Workshop (SLT)最新文献_第9页

A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning 说话人表征学习的自监督蒸馏综合研究

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-28 DOI: 10.1109/SLT54892.2023.10022470

Zhengyang Chen, Yao Qian, Bing Han, Y. Qian, Michael Zeng

引用次数: 4

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis 基于情态特征的大规模预训练编码器在多情态情感分析中的应用

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-28 DOI: 10.1109/SLT54892.2023.10022548

Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

{"title":"On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis","authors":"Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato","doi":"10.1109/SLT54892.2023.10022548","DOIUrl":"https://doi.org/10.1109/SLT54892.2023.10022548","url":null,"abstract":"This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis (MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded by large-scale pre-trained encoders with conventional heuristic features. One each of the largest pre-trained encoders publicly available for each modality are used; CLIP-ViT, WavLM, and BERT for visual, acoustic, and linguistic modalities, respectively. Experiments on two datasets reveal that methods with domain-specific pre-trained encoders attain better performance than those with conventional features in both unimodal and multimodal scenarios. We also find it better to use the outputs of the intermediate layers of the encoders than those of the output layer. The codes are available at https://github.com/ando-hub/MSA_Pretrain.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"71 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131435920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Monotonic Segmental Attention for Automatic Speech Recognition 语音自动识别中的单调分段注意

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-26 DOI: 10.1109/SLT54892.2023.10022818

Albert Zeyer, Robin Schmitt, Wei Zhou, R. Schluter, H. Ney

引用次数: 3

Four-in-One: a Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition 四合一:用于自动语音识别的反文本规范化、标点、大写和不流畅性的联合方法

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-26 DOI: 10.1109/SLT54892.2023.10023257

S.S. Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang

引用次数: 3

Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE 基于ß-VAE的单次跨语言语音转换解纠缠语音表示学习

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-25 DOI: 10.1109/SLT54892.2023.10022787

Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen M. Meng

引用次数: 1

Weak-Supervised Dysarthria-Invariant Features for Spoken Language Understanding Using an Fhvae and Adversarial Training 弱监督构音障碍不变特征在口语理解中的应用

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-24 DOI: 10.1109/SLT54892.2023.10023085

Jinzi Qi, H. V. hamme

引用次数: 1

Proficiency Assessment of L2 Spoken English Using Wav2Vec 2.0 使用Wav2Vec 2.0进行第二语言口语水平评估

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-24 DOI: 10.1109/SLT54892.2023.10023019

Stefano Bannò, M. Matassoni

引用次数: 5

Guided Contrastive Self-Supervised Pre-Training for Automatic Speech Recognition 自动语音识别的引导对比自监督预训练

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-22 DOI: 10.1109/SLT54892.2023.10022676

Aparna Khare, Minhua Wu, Saurabhchand Bhati, J. Droppo, R. Maas

引用次数: 0

Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis 结合对比和非对比损失对语音分析中预训练模型进行微调

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-21 DOI: 10.1109/SLT54892.2023.10022897

Florian Lux, Ching-Yi Chen, Ngoc Thang Vu

引用次数: 1

Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation 基于条件输入表示的全极伽玛酮滤波器组的改进归一化流语音增强

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-21 DOI: 10.1109/SLT54892.2023.10022898

Martin Strauss, Matteo Torcoli, B. Edler

引用次数: 1