György Orosz, GergHo Szab'o, P'eter Berkecz, Zsolt Sz'ant'o, Richárd Farkas
{"title":"Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines","authors":"György Orosz, GergHo Szab'o, P'eter Berkecz, Zsolt Sz'ant'o, Richárd Farkas","doi":"10.1007/978-3-031-40498-6_6","DOIUrl":"https://doi.org/10.1007/978-3-031-40498-6_6","url":null,"abstract":"","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129633597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dataset and Strong Baselines for Classification of Czech News Texts","authors":"Hynek Kydl'ivcek, Jindřich Libovický","doi":"10.48550/arXiv.2307.10666","DOIUrl":"https://doi.org/10.48550/arXiv.2307.10666","url":null,"abstract":"Pre-trained models for Czech Natural Language Processing are often evaluated on purely linguistic tasks (POS tagging, parsing, NER) and relatively simple classification tasks such as sentiment classification or article classification from a single news source. As an alternative, we present CZEch~NEws~Classification~dataset (CZE-NEC), one of the largest Czech classification datasets, composed of news articles from various sources spanning over twenty years, which allows a more rigorous evaluation of such models. We define four classification tasks: news source, news category, inferred author's gender, and day of the week. To verify the task difficulty, we conducted a human evaluation, which revealed that human performance lags behind strong machine-learning baselines built upon pre-trained transformer models. Furthermore, we show that language-specific pre-trained encoder analysis outperforms selected commercially available large-scale generative language models.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123936484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Hartung, Aaricia Herygers, Shubham Kurlekar, Khabbab Zakaria, Taylan Volkan, Sören Gröttrup, Munir Georges
{"title":"Measuring Sentiment Bias in Machine Translation","authors":"Kai Hartung, Aaricia Herygers, Shubham Kurlekar, Khabbab Zakaria, Taylan Volkan, Sören Gröttrup, Munir Georges","doi":"10.48550/arXiv.2306.07152","DOIUrl":"https://doi.org/10.48550/arXiv.2306.07152","url":null,"abstract":"Biases induced to text by generative models have become an increasingly large topic in recent years. In this paper we explore how machine translation might introduce a bias in sentiments as classified by sentiment analysis models. For this, we compare three open access machine translation models for five different languages on two parallel corpora to test if the translation process causes a shift in sentiment classes recognized in the texts. Though our statistic test indicate shifts in the label probability distributions, we find none that appears consistent enough to assume a bias induced by the translation process.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132962245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer Learning of Transformer-based Speech Recognition Models from Czech to Slovak","authors":"Jan Lehecka, J. Psutka, J. Psutka","doi":"10.48550/arXiv.2306.04399","DOIUrl":"https://doi.org/10.48550/arXiv.2306.04399","url":null,"abstract":"In this paper, we are comparing several methods of training the Slovak speech recognition models based on the Transformers architecture. Specifically, we are exploring the approach of transfer learning from the existing Czech pre-trained Wav2Vec 2.0 model into Slovak. We are demonstrating the benefits of the proposed approach on three Slovak datasets. Our Slovak models scored the best results when initializing the weights from the Czech model at the beginning of the pre-training phase. Our results show that the knowledge stored in the Cezch pre-trained model can be successfully reused to solve tasks in Slovak while outperforming even much larger public multilingual models.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125954990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Parthasarathi, Lu Zeng, Christin Jose, Joe Wang
{"title":"Wakeword Detection under Distribution Shifts","authors":"S. Parthasarathi, Lu Zeng, Christin Jose, Joe Wang","doi":"10.48550/arXiv.2207.06423","DOIUrl":"https://doi.org/10.48550/arXiv.2207.06423","url":null,"abstract":"We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125383991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Zeng, S. Parthasarathi, Yuzong Liu, Alex Escott, S. Cheekatmalla, N. Strom, S. Vitaladevuni
{"title":"Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets","authors":"Lu Zeng, S. Parthasarathi, Yuzong Liu, Alex Escott, S. Cheekatmalla, N. Strom, S. Vitaladevuni","doi":"10.48550/arXiv.2207.06920","DOIUrl":"https://doi.org/10.48550/arXiv.2207.06920","url":null,"abstract":". We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1 st -stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh ( . ) on dense layer weights. In the 2 nd -stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 hours of de-identified production, far-field and near-field audio data (evaluating on 4,000 hours of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with off-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both configurations, our results show that the proposed algorithm can achieve: a) parity with a full floating point model’s operating point on a detection error tradeoff (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) significant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124287653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Demir, M. May, A. Schmid, M. Uder, K. Breininger, T. Weise, A. Maier, Seung Hee Yang
{"title":"PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis","authors":"K. Demir, M. May, A. Schmid, M. Uder, K. Breininger, T. Weise, A. Maier, Seung Hee Yang","doi":"10.48550/arXiv.2206.12320","DOIUrl":"https://doi.org/10.48550/arXiv.2206.12320","url":null,"abstract":". This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81 . 4 ± 41 . 0 minutes. The corpus aims to provide a resource for developing a smart speech assistant in operating rooms. In particular, it may be used to develop a speech-controlled system that enables surgeons to control the operation parameters such as C-arm movements and table positions. In order to record the dataset, we acquired consent by the institutional review board and workers’ council in the University Hospital Erlangen and by the patients for data privacy. We describe the recording set-up, data structure, workflow and preprocessing steps, and report the first PoCaP Corpus speech recognition analysis results with 11.52% word error rate using pretrained models. The findings suggest that the data has the potential to build a robust command recognition system and will allow the development of a novel intervention support systems using speech and image in the medical","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"2022 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127601420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Davody, David Ifeoluwa Adelani, Thomas Kleinbauer, D. Klakow
{"title":"TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models","authors":"A. Davody, David Ifeoluwa Adelani, Thomas Kleinbauer, D. Klakow","doi":"10.48550/arXiv.2206.07841","DOIUrl":"https://doi.org/10.48550/arXiv.2206.07841","url":null,"abstract":"Transferring knowledge from one domain to another is of practical importance for many tasks in natural language processing, especially when the amount of available data in the target domain is limited. In this work, we propose a novel few-shot approach to domain adaptation in the context of Named Entity Recognition (NER). We propose a two-step approach consisting of a variable base module and a template module that leverages the knowledge captured in pre-trained language models with the help of simple descriptive patterns. Our approach is simple yet versatile and can be applied in few-shot and zero-shot settings. Evaluating our lightweight approach across a number of different datasets shows that it can boost the performance of state-of-the-art baselines by 2-5% F1-score.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121365468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project","authors":"Jan Lehecka, J. Psutka, J. Psutka","doi":"10.1007/978-3-031-16270-1_25","DOIUrl":"https://doi.org/10.1007/978-3-031-16270-1_25","url":null,"abstract":"","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133790623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Franziska Braun, Andreas Erzigkeit, H. Lehfeld, T. Hillemacher, K. Riedhammer, S. Bayerl
{"title":"Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic Features","authors":"Franziska Braun, Andreas Erzigkeit, H. Lehfeld, T. Hillemacher, K. Riedhammer, S. Bayerl","doi":"10.1007/978-3-031-16270-1_36","DOIUrl":"https://doi.org/10.1007/978-3-031-16270-1_36","url":null,"abstract":"","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125923976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}