Computer Speech and Language最新文献_第5页

Chinese Named Entity Recognition based on adaptive lexical weights 基于自适应词性权重的中文命名实体识别

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-10-16 DOI: 10.1016/j.csl.2024.101735

Yaping Xu , Mengtao Ying , Kunyu Fang, Ruixing Ming

引用次数: 0

Measuring and implementing lexical alignment: A systematic literature review 衡量和实施词法调整：系统文献综述

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-10-11 DOI: 10.1016/j.csl.2024.101731

Sumit Srivastava , Suzanna D. Wentzel , Alejandro Catala , Mariët Theune

引用次数: 0

A hybrid approach to Natural Language Inference for the SICK dataset 针对 SICK 数据集的自然语言推理混合方法

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-10-10 DOI: 10.1016/j.csl.2024.101736

Rodrigo Souza, Marcos Lopes

{"title":"A hybrid approach to Natural Language Inference for the SICK dataset","authors":"Rodrigo Souza, Marcos Lopes","doi":"10.1016/j.csl.2024.101736","DOIUrl":"10.1016/j.csl.2024.101736","url":null,"abstract":"<div><div>Natural Language Inference (NLI) can be described as the task of answering if a short text called <em>Hypothesis</em> (H) can be inferred from another text called <em>Premise</em> (P) (Poliak, 2020; Dagan et al., 2013). Affirmative answers are considered as semantic entailments and negative ones are either contradictions or semantically “neutral” statements. In the last three decades, many Natural Language Processing (NLP) methods have been put to use for solving this task. As it so happened to almost every other NLP task, Deep Learning (DL) techniques in general (and Transformer neural networks in particular) have been achieving the best results in this task in recent years, progressively increasing their outcomes when compared to classical, symbolic Knowledge Representation models in solving NLI.</div><div>Nevertheless, however successful DL models are in measurable results like accuracy and F-score, their outcomes are far from being explicable, and this is an undesirable feature specially in a task such as NLI, which is meant to deal with language understanding together with rational reasoning inherent to entailment and to contradiction judgements. It is therefore tempting to evaluate how more explainable models would perform in NLI and to compare their performance with DL models later on.</div><div>This paper puts forth a pipeline that we called IsoLex. It provides explainable, transparent NLP models for NLI. It has been tested on a partial version of the SICK corpus (Marelli, 2014) called SICK-CE, containing only the contradiction and the entailment pairs (4245 in total), thus leaving aside the neutral pairs, as an attempt to concentrate on unambiguous semantic relationships, which arguably favor the intelligibility of the results.</div><div>The pipeline consists of three serialized commonly used NLP models: first, an Isolation Forest module is used to filter off highly dissimilar Premise-Hypothesis pairs; second, a WordNet-based Lexical Relations module is employed to check whether the Premise and the Hypothesis textual contents are related to each other in terms of synonymy, hyperonymy, or holonymy; finally, similarities between Premise and Hypothesis texts are evaluated by a simple cosine similarity function based on Word2Vec embeddings.</div><div>IsoLex has achieved 92% accuracy and 94% F-1 on SICK-CE. This is close to SOTA models for this kind of task, such as RoBERTa with a 98% accuracy and 99% F-1 on the same dataset.</div><div>The small performance gap between IsoLex and SOTA DL models is largely compensated by intelligibility on every step of the proposed pipeline. At anytime it is possible to evaluate the role of similarity, lexical relatedness and so forth in the overall process of inference.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101736"},"PeriodicalIF":3.1,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition DSTM：语音情感识别中基于变压器的动静特征融合模型

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-10-09 DOI: 10.1016/j.csl.2024.101733

Guowei Jin, Yunfeng Xu, Hong Kang, Jialin Wang, Borui Miao

{"title":"DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition","authors":"Guowei Jin, Yunfeng Xu, Hong Kang, Jialin Wang, Borui Miao","doi":"10.1016/j.csl.2024.101733","DOIUrl":"10.1016/j.csl.2024.101733","url":null,"abstract":"<div><div>With the support of multi-head attention, the Transformer shows remarkable results in speech emotion recognition. However, existing models still suffer from the inability to accurately locate important regions in semantic information at different time scales. To address this problem, we propose a Transformer-based network model for dynamic-static feature fusion, composed of a locally adaptive multi-head attention module and a global static attention module. The locally dynamic multi-head attention module adapts the attention window sizes and window centers of the different regions through speech samples and learnable parameters, enabling the model to adaptively discover and pay attention to valuable information embedded in speech. The global static attention module enables the model to use each element in the sequence fully and learn critical global feature information by establishing connections over the entire input sequence. We also use the data mixture training method to train our model and introduce the CENTER LOSS function to supervise the training of the model, which can better speed up the fitting speed of the model and alleviate the sample imbalance problem to a certain extent. This method achieved good performance on the IEMOCAP and MELD datasets, proving that our proposed model structure and method have better accuracy and robustness.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101733"},"PeriodicalIF":3.1,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FE-CFNER: Feature Enhancement-based approach for Chinese Few-shot Named Entity Recognition FE-CFNER：基于特征增强的中文少量命名实体识别方法

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-10-09 DOI: 10.1016/j.csl.2024.101730

Sanhe Yang, Peichao Lai, Ruixiong Fang, Yanggeng Fu, Feiyang Ye, Yilei Wang

{"title":"FE-CFNER: Feature Enhancement-based approach for Chinese Few-shot Named Entity Recognition","authors":"Sanhe Yang, Peichao Lai, Ruixiong Fang, Yanggeng Fu, Feiyang Ye, Yilei Wang","doi":"10.1016/j.csl.2024.101730","DOIUrl":"10.1016/j.csl.2024.101730","url":null,"abstract":"<div><div>Although significant progress has been made in Chinese Named Entity Recognition (NER) methods based on deep learning, their performance often falls short in few-shot scenarios. Feature enhancement is considered a promising approach to address the issue of Chinese few-shot NER. However, traditional feature fusion methods tend to lead to the loss of important information and the integration of irrelevant information. Despite the benefits of incorporating BERT for improving entity recognition, its performance is limited when training data is insufficient. To tackle these challenges, this paper proposes a Feature Enhancement-based approach for Chinese Few-shot NER called FE-CFNER. FE-CFNER designs a double cross neural network to minimize information loss through the interaction of feature cross twice. Additionally, adaptive weights and a top-<span><math><mi>k</mi></math></span> mechanism are introduced to sparsify attention distributions, enabling the model to prioritize important information related to entities while excluding irrelevant information. To further enhance the quality of BERT embeddings, FE-CFNER employs a contrastive template for contrastive learning pre-training of BERT, enhancing BERT’s semantic understanding capability. We evaluate the proposed method on four sampled Chinese NER datasets: Weibo, Resume, Taobao, and Youku. Experimental results validate the effectiveness and superiority of FE-CFNER in Chinese few-shot NER tasks.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101730"},"PeriodicalIF":3.1,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spoofing countermeasure for fake speech detection using brute force features 利用暴力特征检测假语音的欺骗对策

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-10-02 DOI: 10.1016/j.csl.2024.101732

Arsalan Rahman Mirza , Abdulbasit K. Al-Talabani

{"title":"Spoofing countermeasure for fake speech detection using brute force features","authors":"Arsalan Rahman Mirza , Abdulbasit K. Al-Talabani","doi":"10.1016/j.csl.2024.101732","DOIUrl":"10.1016/j.csl.2024.101732","url":null,"abstract":"<div><div>Due to the progress in deep learning technology, techniques that generate spoofed speech have significantly emerged. Such synthetic speech can be exploited for harmful purposes, like impersonation or disseminating false information. Researchers in the area investigate the useful features for spoof detection. This paper extensively investigates three problems in spoof detection in speech, namely, the imbalanced sample per class, which may negatively affect the performance of any detection models, the effect of the feature early and late fusion, and the analysis of unseen attacks on the model. Regarding the imbalanced issue, we have proposed two approaches (a Synthetic Minority Over Sampling Technique (SMOTE)-based and a Bootstrap-based model). We have used the OpenSMILE toolkit, to extract different feature sets, their results and early and late fusion of them have been investigated. The experiments are evaluated using the ASVspoof 2019 datasets which encompass synthetic, voice-conversion, and replayed speech samples. Additionally, Support Vector Machine (SVM) and Deep Neural Network (DNN) have been adopted in the classification. The outcomes from various test scenarios indicated that neither the imbalanced nature of the dataset nor a specific feature or their fusions outperformed the brute force version of the model as the best Equal Error Rate (EER) achieved by the Imbalance model is 6.67 % and 1.80 % for both Logical Access (LA) and Physical Access (PA) respectively.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101732"},"PeriodicalIF":3.1,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A language-agnostic model of child language acquisition 儿童语言习得的语言诊断模型

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-09-30 DOI: 10.1016/j.csl.2024.101714

Louis Mahon , Omri Abend , Uri Berger , Katherine Demuth , Mark Johnson , Mark Steedman

引用次数: 0

Evidence and Axial Attention Guided Document-level Relation Extraction 证据和轴向注意力引导的文档级关系提取

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-09-28 DOI: 10.1016/j.csl.2024.101728

Jiawei Yuan , Hongyong Leng , Yurong Qian , Jiaying Chen , Mengnan Ma , Shuxiang Hou

{"title":"Evidence and Axial Attention Guided Document-level Relation Extraction","authors":"Jiawei Yuan , Hongyong Leng , Yurong Qian , Jiaying Chen , Mengnan Ma , Shuxiang Hou","doi":"10.1016/j.csl.2024.101728","DOIUrl":"10.1016/j.csl.2024.101728","url":null,"abstract":"<div><div>Document-level Relation Extraction (DocRE) aims to identify semantic relations among multiple entity pairs within a document. Most of the previous DocRE methods take the entire document as input. However, for human annotators, a small subset of sentences in the document, namely the evidence, is sufficient to infer the relation of an entity pair. Additionally, a document usually contains multiple entities, and these entities are scattered throughout various location of the document. Previous models use these entities independently, ignore the global interdependency among relation triples. To handle above issues, we propose a novel framework EAAGRE (Evidence and Axial Attention Guided Relation Extraction). Firstly, we use human-annotated evidence labels to supervise the attention module of DocRE system, making the model pay attention to the evidence sentences rather than others. Secondly, we construct an entity-level relation matrix and use axial attention to capture the global interactions among entity pairs. By doing so, we further extract the relations that require multiple entity pairs for prediction. We conduct various experiments on DocRED and have some improvement compared to baseline models, verifying the effectiveness of our model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101728"},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SPNet: A Serial and Parallel Convolutional Neural Network algorithm for the cross-language coreference resolution SPNet：一种用于跨语言共同参考解析的串行和并行卷积神经网络算法

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-09-28 DOI: 10.1016/j.csl.2024.101729

Zixi Jia , Tianli Zhao , Jingyu Ru , Yanxiang Meng , Bing Xia

{"title":"SPNet: A Serial and Parallel Convolutional Neural Network algorithm for the cross-language coreference resolution","authors":"Zixi Jia , Tianli Zhao , Jingyu Ru , Yanxiang Meng , Bing Xia","doi":"10.1016/j.csl.2024.101729","DOIUrl":"10.1016/j.csl.2024.101729","url":null,"abstract":"<div><div>Current models of coreference resolution always neglect the importance of hidden feature extraction, accurate scoring framework design, and the long-term influence of preceding potential antecedents on future decision-making. However, these aspects play vital roles in scoring the likelihood of coreference between an anaphora and its’ real antecedent. In this paper, we present a novel model named Serial and Parallel Convolutional Neural Network (SPNet). Based on the SPNet, two kinds of resolvers are proposed. Given the characteristics of reinforcement learning, we joint the reinforcement learning framework and the SPNet to solve the problem of Chinese zero pronoun resolution. What is more, we make some fine-tuning on the SPNet and propose a new resolver combined with the end-to-end framework to solve the problem of coreference resolution. The experiments are conducted on the CoNLL-2012 dataset and the results show that our model is effective. Our model achieves excellent performance in the Chinese zero pronoun resolution task. On the other hand, compared with our baseline, our model also has an improvement of 0.3% in coreference resolution task.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"91 ","pages":"Article 101729"},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Speech Generation for Indigenous Language Education 土著语言教育的语音生成

IF 3.1 3区计算机科学

Computer Speech and Language Pub Date : 2024-09-28 DOI: 10.1016/j.csl.2024.101723

Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi

{"title":"Speech Generation for Indigenous Language Education","authors":"Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi","doi":"10.1016/j.csl.2024.101723","DOIUrl":"10.1016/j.csl.2024.101723","url":null,"abstract":"<div><div>As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101723"},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0