Computer Speech and Language最新文献

筛选
英文 中文
DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition DSTM:语音情感识别中基于变压器的动静特征融合模型
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-10-09 DOI: 10.1016/j.csl.2024.101733
Guowei Jin, Yunfeng Xu, Hong Kang, Jialin Wang, Borui Miao
{"title":"DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition","authors":"Guowei Jin,&nbsp;Yunfeng Xu,&nbsp;Hong Kang,&nbsp;Jialin Wang,&nbsp;Borui Miao","doi":"10.1016/j.csl.2024.101733","DOIUrl":"10.1016/j.csl.2024.101733","url":null,"abstract":"<div><div>With the support of multi-head attention, the Transformer shows remarkable results in speech emotion recognition. However, existing models still suffer from the inability to accurately locate important regions in semantic information at different time scales. To address this problem, we propose a Transformer-based network model for dynamic-static feature fusion, composed of a locally adaptive multi-head attention module and a global static attention module. The locally dynamic multi-head attention module adapts the attention window sizes and window centers of the different regions through speech samples and learnable parameters, enabling the model to adaptively discover and pay attention to valuable information embedded in speech. The global static attention module enables the model to use each element in the sequence fully and learn critical global feature information by establishing connections over the entire input sequence. We also use the data mixture training method to train our model and introduce the CENTER LOSS function to supervise the training of the model, which can better speed up the fitting speed of the model and alleviate the sample imbalance problem to a certain extent. This method achieved good performance on the IEMOCAP and MELD datasets, proving that our proposed model structure and method have better accuracy and robustness.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FE-CFNER: Feature Enhancement-based approach for Chinese Few-shot Named Entity Recognition FE-CFNER:基于特征增强的中文少量命名实体识别方法
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-10-09 DOI: 10.1016/j.csl.2024.101730
Sanhe Yang, Peichao Lai, Ruixiong Fang, Yanggeng Fu, Feiyang Ye, Yilei Wang
{"title":"FE-CFNER: Feature Enhancement-based approach for Chinese Few-shot Named Entity Recognition","authors":"Sanhe Yang,&nbsp;Peichao Lai,&nbsp;Ruixiong Fang,&nbsp;Yanggeng Fu,&nbsp;Feiyang Ye,&nbsp;Yilei Wang","doi":"10.1016/j.csl.2024.101730","DOIUrl":"10.1016/j.csl.2024.101730","url":null,"abstract":"<div><div>Although significant progress has been made in Chinese Named Entity Recognition (NER) methods based on deep learning, their performance often falls short in few-shot scenarios. Feature enhancement is considered a promising approach to address the issue of Chinese few-shot NER. However, traditional feature fusion methods tend to lead to the loss of important information and the integration of irrelevant information. Despite the benefits of incorporating BERT for improving entity recognition, its performance is limited when training data is insufficient. To tackle these challenges, this paper proposes a Feature Enhancement-based approach for Chinese Few-shot NER called FE-CFNER. FE-CFNER designs a double cross neural network to minimize information loss through the interaction of feature cross twice. Additionally, adaptive weights and a top-<span><math><mi>k</mi></math></span> mechanism are introduced to sparsify attention distributions, enabling the model to prioritize important information related to entities while excluding irrelevant information. To further enhance the quality of BERT embeddings, FE-CFNER employs a contrastive template for contrastive learning pre-training of BERT, enhancing BERT’s semantic understanding capability. We evaluate the proposed method on four sampled Chinese NER datasets: Weibo, Resume, Taobao, and Youku. Experimental results validate the effectiveness and superiority of FE-CFNER in Chinese few-shot NER tasks.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spoofing countermeasure for fake speech detection using brute force features 利用暴力特征检测假语音的欺骗对策
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-10-02 DOI: 10.1016/j.csl.2024.101732
Arsalan Rahman Mirza , Abdulbasit K. Al-Talabani
{"title":"Spoofing countermeasure for fake speech detection using brute force features","authors":"Arsalan Rahman Mirza ,&nbsp;Abdulbasit K. Al-Talabani","doi":"10.1016/j.csl.2024.101732","DOIUrl":"10.1016/j.csl.2024.101732","url":null,"abstract":"<div><div>Due to the progress in deep learning technology, techniques that generate spoofed speech have significantly emerged. Such synthetic speech can be exploited for harmful purposes, like impersonation or disseminating false information. Researchers in the area investigate the useful features for spoof detection. This paper extensively investigates three problems in spoof detection in speech, namely, the imbalanced sample per class, which may negatively affect the performance of any detection models, the effect of the feature early and late fusion, and the analysis of unseen attacks on the model. Regarding the imbalanced issue, we have proposed two approaches (a Synthetic Minority Over Sampling Technique (SMOTE)-based and a Bootstrap-based model). We have used the OpenSMILE toolkit, to extract different feature sets, their results and early and late fusion of them have been investigated. The experiments are evaluated using the ASVspoof 2019 datasets which encompass synthetic, voice-conversion, and replayed speech samples. Additionally, Support Vector Machine (SVM) and Deep Neural Network (DNN) have been adopted in the classification. The outcomes from various test scenarios indicated that neither the imbalanced nature of the dataset nor a specific feature or their fusions outperformed the brute force version of the model as the best Equal Error Rate (EER) achieved by the Imbalance model is 6.67 % and 1.80 % for both Logical Access (LA) and Physical Access (PA) respectively.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A language-agnostic model of child language acquisition 儿童语言习得的语言诊断模型
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-09-30 DOI: 10.1016/j.csl.2024.101714
Louis Mahon , Omri Abend , Uri Berger , Katherine Demuth , Mark Johnson , Mark Steedman
{"title":"A language-agnostic model of child language acquisition","authors":"Louis Mahon ,&nbsp;Omri Abend ,&nbsp;Uri Berger ,&nbsp;Katherine Demuth ,&nbsp;Mark Johnson ,&nbsp;Mark Steedman","doi":"10.1016/j.csl.2024.101714","DOIUrl":"10.1016/j.csl.2024.101714","url":null,"abstract":"<div><div>This work reimplements a recent semantic bootstrapping child language acquisition (CLA) model, which was originally designed for English, and trains it to learn a new language: Hebrew. The model learns from pairs of utterances and logical forms as meaning representations, and acquires both syntax and word meanings simultaneously. The results show that the model mostly transfers to Hebrew, but that a number of factors, including the richer morphology in Hebrew, makes the learning slower and less robust. This suggests that a clear direction for future work is to enable the model to leverage the similarities between different word forms.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evidence and Axial Attention Guided Document-level Relation Extraction 证据和轴向注意力引导的文档级关系提取
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-09-28 DOI: 10.1016/j.csl.2024.101728
Jiawei Yuan , Hongyong Leng , Yurong Qian , Jiaying Chen , Mengnan Ma , Shuxiang Hou
{"title":"Evidence and Axial Attention Guided Document-level Relation Extraction","authors":"Jiawei Yuan ,&nbsp;Hongyong Leng ,&nbsp;Yurong Qian ,&nbsp;Jiaying Chen ,&nbsp;Mengnan Ma ,&nbsp;Shuxiang Hou","doi":"10.1016/j.csl.2024.101728","DOIUrl":"10.1016/j.csl.2024.101728","url":null,"abstract":"<div><div>Document-level Relation Extraction (DocRE) aims to identify semantic relations among multiple entity pairs within a document. Most of the previous DocRE methods take the entire document as input. However, for human annotators, a small subset of sentences in the document, namely the evidence, is sufficient to infer the relation of an entity pair. Additionally, a document usually contains multiple entities, and these entities are scattered throughout various location of the document. Previous models use these entities independently, ignore the global interdependency among relation triples. To handle above issues, we propose a novel framework EAAGRE (Evidence and Axial Attention Guided Relation Extraction). Firstly, we use human-annotated evidence labels to supervise the attention module of DocRE system, making the model pay attention to the evidence sentences rather than others. Secondly, we construct an entity-level relation matrix and use axial attention to capture the global interactions among entity pairs. By doing so, we further extract the relations that require multiple entity pairs for prediction. We conduct various experiments on DocRED and have some improvement compared to baseline models, verifying the effectiveness of our model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Generation for Indigenous Language Education 土著语言教育的语音生成
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-09-28 DOI: 10.1016/j.csl.2024.101723
Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi
{"title":"Speech Generation for Indigenous Language Education","authors":"Aidan Pine ,&nbsp;Erica Cooper ,&nbsp;David Guzmán ,&nbsp;Eric Joanis ,&nbsp;Anna Kazantseva ,&nbsp;Ross Krekoski ,&nbsp;Roland Kuhn ,&nbsp;Samuel Larkin ,&nbsp;Patrick Littell ,&nbsp;Delaney Lothian ,&nbsp;Akwiratékha’ Martin ,&nbsp;Korin Richmond ,&nbsp;Marc Tessier ,&nbsp;Cassia Valentini-Botinhao ,&nbsp;Dan Wells ,&nbsp;Junichi Yamagishi","doi":"10.1016/j.csl.2024.101723","DOIUrl":"10.1016/j.csl.2024.101723","url":null,"abstract":"<div><div>As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing analysis of diadochokinetic speech using deep neural networks 利用深度神经网络加强对双声道语音的分析
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-09-02 DOI: 10.1016/j.csl.2024.101715
Yael Segal-Feldman , Kasia Hitczenko , Matthew Goldrick , Adam Buchwald , Angela Roberts , Joseph Keshet
{"title":"Enhancing analysis of diadochokinetic speech using deep neural networks","authors":"Yael Segal-Feldman ,&nbsp;Kasia Hitczenko ,&nbsp;Matthew Goldrick ,&nbsp;Adam Buchwald ,&nbsp;Angela Roberts ,&nbsp;Joseph Keshet","doi":"10.1016/j.csl.2024.101715","DOIUrl":"10.1016/j.csl.2024.101715","url":null,"abstract":"<div><p>Diadochokinetic speech tasks (DDK) involve the repetitive production of consonant-vowel syllables. These tasks are useful in detecting impairments, differential diagnosis, and monitoring progress in speech-motor impairments. However, manual analysis of those tasks is time-consuming, subjective, and provides only a rough picture of speech. This paper presents several deep neural network models working on the raw waveform for the automatic segmentation of stop consonants and vowels from unannotated and untranscribed speech. A deep encoder serves as a features extractor module, replacing conventional signal processing features. In this context, diverse deep learning architectures, such as convolutional neural networks (CNNs) and large self-supervised models like HuBERT, are applied for the extraction process. A decoder model uses derived embeddings to identify frame types. Consequently, the paper studies diverse deep architectures, ranging from linear layers, LSTM, CNN, and transformers. These architectures are assessed for their ability to detect speech rate, sound duration, and boundary locations on a dataset of healthy individuals and an unseen dataset of older individuals with Parkinson’s Disease. The results reveal that an LSTM model performs better than all other models on both datasets and is comparable to trained human annotators.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copiously Quote Classics: Improving Chinese Poetry Generation with historical allusion knowledge 大量引用经典:用历史典故知识提高中国诗歌创作水平
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-08-30 DOI: 10.1016/j.csl.2024.101708
Zhonghe Han , Jintao Liu , Yuanben Zhang , Lili Zhang , Lei Wang , Zequn Zhang , Zhihao Zhao , Zhenyu Huang
{"title":"Copiously Quote Classics: Improving Chinese Poetry Generation with historical allusion knowledge","authors":"Zhonghe Han ,&nbsp;Jintao Liu ,&nbsp;Yuanben Zhang ,&nbsp;Lili Zhang ,&nbsp;Lei Wang ,&nbsp;Zequn Zhang ,&nbsp;Zhihao Zhao ,&nbsp;Zhenyu Huang","doi":"10.1016/j.csl.2024.101708","DOIUrl":"10.1016/j.csl.2024.101708","url":null,"abstract":"<div><p>Integrating allusions into poems is an advanced form of human poetry writing, which could clearly express the author’s thoughts and arouse the resonance of readers. However, existing poetry generation works mainly focus on improving the coherence and fluency of poetry, while generating poems with allusion knowledge is rarely considered. To solve this issue, we propose an <strong>A</strong>llusion-aware <strong>C</strong>hinese <strong>P</strong>oetry <strong>G</strong>eneration (ACPG) framework in this study. Concretely, we first release an <strong>A</strong>llusion-<strong>E</strong>nriched <strong>P</strong>oetry (AEP) dataset by linking poems with historical allusions, which might enable a new research direction for poetry generation. Based on this dataset, we design a three-stage learning mechanism to encourage the training stage under a low-resource setting, which can effectively exploit the knowledge of large-scale poetry and allusion data to generate informative allusive poems. Extensive experiments demonstrate the effectiveness of ACPG among a series of proposed baselines. Moreover, the proposed ACPG framework can also be applied to lyrics generation or other controlled text generation tasks, which can incorporate allusion knowledge into the generated results and enhance the meaning and quality of the texts.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Significance of chirp MFCC as a feature in speech and audio applications 啁啾 MFCC 作为语音和音频应用特征的意义
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-08-22 DOI: 10.1016/j.csl.2024.101713
S. Johanan Joysingh , P. Vijayalakshmi , T. Nagarajan
{"title":"Significance of chirp MFCC as a feature in speech and audio applications","authors":"S. Johanan Joysingh ,&nbsp;P. Vijayalakshmi ,&nbsp;T. Nagarajan","doi":"10.1016/j.csl.2024.101713","DOIUrl":"10.1016/j.csl.2024.101713","url":null,"abstract":"<div><p>A novel feature, based on the chirp z-transform, that offers an improved representation of the underlying true spectrum is proposed. This feature, the chirp MFCC, is derived by computing the Mel frequency cepstral coefficients from the chirp magnitude spectrum, instead of the Fourier transform magnitude spectrum. The theoretical foundations for the proposal, and the experimental validation using product of likelihood Gaussians, to show the improved class separation offered by the proposed chirp MFCC, when compared with basic MFCC are discussed. Further, real world evaluation of the feature is performed using three diverse tasks, namely, speech–music classification, speaker identification, and speech commands recognition. It is shown in all three tasks that the proposed chirp MFCC offers considerable improvements.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000962/pdfft?md5=9eea65049758593f74e943bfcd89ac3f&pid=1-s2.0-S0885230824000962-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial disfluency detection, uh no, disfluency generation for the masses 人工失言检测,呃,不,是为大众生成失言
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-08-17 DOI: 10.1016/j.csl.2024.101711
Tatiana Passali , Thanassis Mavropoulos , Grigorios Tsoumakas , Georgios Meditskos , Stefanos Vrochidis
{"title":"Artificial disfluency detection, uh no, disfluency generation for the masses","authors":"Tatiana Passali ,&nbsp;Thanassis Mavropoulos ,&nbsp;Grigorios Tsoumakas ,&nbsp;Georgios Meditskos ,&nbsp;Stefanos Vrochidis","doi":"10.1016/j.csl.2024.101711","DOIUrl":"10.1016/j.csl.2024.101711","url":null,"abstract":"<div><p>Existing approaches for disfluency detection typically require the existence of large annotated datasets. However, current datasets for this task are limited, suffer from class imbalance, and lack some types of disfluencies that are encountered in real-world scenarios. At the same time, augmentation techniques for disfluency detection are not able to model complex types of disfluencies. This limits such approaches to only performing pre-training since the generated data are not indicative of disfluencies that occur in real scenarios and, as a result, cannot be directly used for training disfluency detection models, as we experimentally demonstrate. This imposes significant constraints on the usefulness of such approaches in practice since real disfluencies still have to be collected in order to train the models. In this work, we propose Large-scale ARtificial Disfluency Generation (LARD), a method for automatically generating artificial disfluencies, and more specifically repairs, from fluent text. Unlike existing augmentation techniques, LARD can simulate all the different and complex types of disfluencies. In addition, it incorporates contextual embeddings into the disfluency generation to produce realistic, context-aware artificial disfluencies. LARD can be used effectively for training disfluency detection models, bypassing the requirement of annotated disfluent data. Our empirical evaluation shows that LARD outperforms existing rule-based augmentation methods and increases the accuracy of existing disfluency detectors. In addition, experiments demonstrate that the proposed method can be effectively used in a low-resource setup.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000949/pdfft?md5=3e3442312f5819775b9ad09e131a9dd3&pid=1-s2.0-S0885230824000949-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信