International Conference on Text, Speech and Dialogue最新文献

Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines 推进匈牙利语文本处理与HuSpaCy:高效和准确的NLP管道

International Conference on Text, Speech and Dialogue Pub Date : 2023-08-24 DOI: 10.1007/978-3-031-40498-6_6

György Orosz, GergHo Szab'o, P'eter Berkecz, Zsolt Sz'ant'o, Richárd Farkas

引用次数: 0

A Dataset and Strong Baselines for Classification of Czech News Texts 捷克语新闻文本分类的数据集和强基线

International Conference on Text, Speech and Dialogue Pub Date : 2023-07-20 DOI: 10.48550/arXiv.2307.10666

Hynek Kydl'ivcek, Jindřich Libovický

引用次数: 0

Measuring Sentiment Bias in Machine Translation 机器翻译中情感偏差的测量

International Conference on Text, Speech and Dialogue Pub Date : 2023-06-12 DOI: 10.48550/arXiv.2306.07152

Kai Hartung, Aaricia Herygers, Shubham Kurlekar, Khabbab Zakaria, Taylan Volkan, Sören Gröttrup, Munir Georges

引用次数: 0

Transfer Learning of Transformer-based Speech Recognition Models from Czech to Slovak 基于变压器的捷克语到斯洛伐克语语音识别模型的迁移学习

International Conference on Text, Speech and Dialogue Pub Date : 2023-06-07 DOI: 10.48550/arXiv.2306.04399

Jan Lehecka, J. Psutka, J. Psutka

引用次数: 0

Wakeword Detection under Distribution Shifts 分布移位下的唤醒词检测

International Conference on Text, Speech and Dialogue Pub Date : 2022-07-13 DOI: 10.48550/arXiv.2207.06423

S. Parthasarathi, Lu Zeng, Christin Jose, Joe Wang

{"title":"Wakeword Detection under Distribution Shifts","authors":"S. Parthasarathi, Lu Zeng, Christin Jose, Joe Wang","doi":"10.48550/arXiv.2207.06423","DOIUrl":"https://doi.org/10.48550/arXiv.2207.06423","url":null,"abstract":"We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125383991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets 嵌入式芯片组流关键字定位模型的亚8位量化

International Conference on Text, Speech and Dialogue Pub Date : 2022-07-13 DOI: 10.48550/arXiv.2207.06920

Lu Zeng, S. Parthasarathi, Yuzong Liu, Alex Escott, S. Cheekatmalla, N. Strom, S. Vitaladevuni

{"title":"Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets","authors":"Lu Zeng, S. Parthasarathi, Yuzong Liu, Alex Escott, S. Cheekatmalla, N. Strom, S. Vitaladevuni","doi":"10.48550/arXiv.2207.06920","DOIUrl":"https://doi.org/10.48550/arXiv.2207.06920","url":null,"abstract":". We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1 st -stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh ( . ) on dense layer weights. In the 2 nd -stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 hours of de-identiﬁed production, far-ﬁeld and near-ﬁeld audio data (evaluating on 4,000 hours of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with oﬀ-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both conﬁgurations, our results show that the proposed algorithm can achieve: a) parity with a full ﬂoating point model’s operating point on a detection error tradeoﬀ (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) signiﬁcant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124287653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis PoCaP语料库:基于介入放射学工作流程分析的智能手术室语音助手多模态数据集

International Conference on Text, Speech and Dialogue Pub Date : 2022-06-24 DOI: 10.48550/arXiv.2206.12320

K. Demir, M. May, A. Schmid, M. Uder, K. Breininger, T. Weise, A. Maier, Seung Hee Yang

{"title":"PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis","authors":"K. Demir, M. May, A. Schmid, M. Uder, K. Breininger, T. Weise, A. Maier, Seung Hee Yang","doi":"10.48550/arXiv.2206.12320","DOIUrl":"https://doi.org/10.48550/arXiv.2206.12320","url":null,"abstract":". This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81 . 4 ± 41 . 0 minutes. The corpus aims to provide a resource for developing a smart speech assistant in operating rooms. In particular, it may be used to develop a speech-controlled system that enables surgeons to control the operation parameters such as C-arm movements and table positions. In order to record the dataset, we acquired consent by the institutional review board and workers’ council in the University Hospital Erlangen and by the patients for data privacy. We describe the recording set-up, data structure, workﬂow and preprocessing steps, and report the ﬁrst PoCaP Corpus speech recognition analysis results with 11.52% word error rate using pretrained models. The ﬁndings suggest that the data has the potential to build a robust command recognition system and will allow the development of a novel intervention support systems using speech and image in the medical","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"2022 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127601420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models TOKEN是一种掩码:使用预训练的语言模型进行少镜头命名实体识别

International Conference on Text, Speech and Dialogue Pub Date : 2022-06-15 DOI: 10.48550/arXiv.2206.07841

A. Davody, David Ifeoluwa Adelani, Thomas Kleinbauer, D. Klakow

引用次数: 2

Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project MALACH项目中基于变压器的正式和口语捷克语自动语音识别

International Conference on Text, Speech and Dialogue Pub Date : 2022-06-15 DOI: 10.1007/978-3-031-16270-1_25

Jan Lehecka, J. Psutka, J. Psutka

引用次数: 3

Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic Features 超越饼干盗窃图片测试:使用声学特征检测认知障碍

International Conference on Text, Speech and Dialogue Pub Date : 2022-06-10 DOI: 10.1007/978-3-031-16270-1_36

Franziska Braun, Andreas Erzigkeit, H. Lehfeld, T. Hillemacher, K. Riedhammer, S. Bayerl

引用次数: 7