Yaping Xu , Mengtao Ying , Kunyu Fang, Ruixing Ming
{"title":"Chinese Named Entity Recognition based on adaptive lexical weights","authors":"Yaping Xu , Mengtao Ying , Kunyu Fang, Ruixing Ming","doi":"10.1016/j.csl.2024.101735","DOIUrl":"10.1016/j.csl.2024.101735","url":null,"abstract":"<div><div>Currently, many researchers use weights to merge self-matched words obtained through dictionary matching in order to enhance the performance of Named Entity Recognition (NER). However, these studies overlook the relationship between words and sentences when calculating lexical weights, resulting in fused word information that often does not align with the intended meaning of the sentence. Addressing above issue and enhance the prediction performance, we propose an adaptive lexical weight approach for determining lexical weights. Given a sentence, we utilize an enhanced global attention mechanism to compute the correlation between self-matching words and sentences, thereby focusing attention on crucial words while disregarding unreliable portions. Experimental results demonstrate that our proposed model outperforms existing state-of-the-art methods for Chinese NER of MRSA, Weibo, and Resume datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101735"},"PeriodicalIF":3.1,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring and implementing lexical alignment: A systematic literature review","authors":"Sumit Srivastava , Suzanna D. Wentzel , Alejandro Catala , Mariët Theune","doi":"10.1016/j.csl.2024.101731","DOIUrl":"10.1016/j.csl.2024.101731","url":null,"abstract":"<div><div>Lexical Alignment is a phenomenon often found in human–human conversations, where the interlocutors converge during a conversation to use the same terms and phrases for the same underlying concepts. Alignment (linguistic) is a mechanism used by humans for better communication between interlocutors at various levels of linguistic knowledge and features, and one of them is lexical. The existing literature suggests that alignment has a significant role in communication between humans, and is also beneficial in human–agent communication. Various methods have been proposed in the past to measure lexical alignment in human–human conversations, and also to implement them in conversational agents. In this research, we carry out an analysis of the existing methods to measure lexical alignment and also dissect methods to implement it in a conversational agent for personalizing human–agent interactions. We propose a new set of criteria that such methods should meet and discuss the possible improvements that can be made to existing methods.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101731"},"PeriodicalIF":3.1,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid approach to Natural Language Inference for the SICK dataset","authors":"Rodrigo Souza, Marcos Lopes","doi":"10.1016/j.csl.2024.101736","DOIUrl":"10.1016/j.csl.2024.101736","url":null,"abstract":"<div><div>Natural Language Inference (NLI) can be described as the task of answering if a short text called <em>Hypothesis</em> (H) can be inferred from another text called <em>Premise</em> (P) (Poliak, 2020; Dagan et al., 2013). Affirmative answers are considered as semantic entailments and negative ones are either contradictions or semantically “neutral” statements. In the last three decades, many Natural Language Processing (NLP) methods have been put to use for solving this task. As it so happened to almost every other NLP task, Deep Learning (DL) techniques in general (and Transformer neural networks in particular) have been achieving the best results in this task in recent years, progressively increasing their outcomes when compared to classical, symbolic Knowledge Representation models in solving NLI.</div><div>Nevertheless, however successful DL models are in measurable results like accuracy and F-score, their outcomes are far from being explicable, and this is an undesirable feature specially in a task such as NLI, which is meant to deal with language understanding together with rational reasoning inherent to entailment and to contradiction judgements. It is therefore tempting to evaluate how more explainable models would perform in NLI and to compare their performance with DL models later on.</div><div>This paper puts forth a pipeline that we called IsoLex. It provides explainable, transparent NLP models for NLI. It has been tested on a partial version of the SICK corpus (Marelli, 2014) called SICK-CE, containing only the contradiction and the entailment pairs (4245 in total), thus leaving aside the neutral pairs, as an attempt to concentrate on unambiguous semantic relationships, which arguably favor the intelligibility of the results.</div><div>The pipeline consists of three serialized commonly used NLP models: first, an Isolation Forest module is used to filter off highly dissimilar Premise-Hypothesis pairs; second, a WordNet-based Lexical Relations module is employed to check whether the Premise and the Hypothesis textual contents are related to each other in terms of synonymy, hyperonymy, or holonymy; finally, similarities between Premise and Hypothesis texts are evaluated by a simple cosine similarity function based on Word2Vec embeddings.</div><div>IsoLex has achieved 92% accuracy and 94% F-1 on SICK-CE. This is close to SOTA models for this kind of task, such as RoBERTa with a 98% accuracy and 99% F-1 on the same dataset.</div><div>The small performance gap between IsoLex and SOTA DL models is largely compensated by intelligibility on every step of the proposed pipeline. At anytime it is possible to evaluate the role of similarity, lexical relatedness and so forth in the overall process of inference.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101736"},"PeriodicalIF":3.1,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guowei Jin, Yunfeng Xu, Hong Kang, Jialin Wang, Borui Miao
{"title":"DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition","authors":"Guowei Jin, Yunfeng Xu, Hong Kang, Jialin Wang, Borui Miao","doi":"10.1016/j.csl.2024.101733","DOIUrl":"10.1016/j.csl.2024.101733","url":null,"abstract":"<div><div>With the support of multi-head attention, the Transformer shows remarkable results in speech emotion recognition. However, existing models still suffer from the inability to accurately locate important regions in semantic information at different time scales. To address this problem, we propose a Transformer-based network model for dynamic-static feature fusion, composed of a locally adaptive multi-head attention module and a global static attention module. The locally dynamic multi-head attention module adapts the attention window sizes and window centers of the different regions through speech samples and learnable parameters, enabling the model to adaptively discover and pay attention to valuable information embedded in speech. The global static attention module enables the model to use each element in the sequence fully and learn critical global feature information by establishing connections over the entire input sequence. We also use the data mixture training method to train our model and introduce the CENTER LOSS function to supervise the training of the model, which can better speed up the fitting speed of the model and alleviate the sample imbalance problem to a certain extent. This method achieved good performance on the IEMOCAP and MELD datasets, proving that our proposed model structure and method have better accuracy and robustness.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101733"},"PeriodicalIF":3.1,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanhe Yang, Peichao Lai, Ruixiong Fang, Yanggeng Fu, Feiyang Ye, Yilei Wang
{"title":"FE-CFNER: Feature Enhancement-based approach for Chinese Few-shot Named Entity Recognition","authors":"Sanhe Yang, Peichao Lai, Ruixiong Fang, Yanggeng Fu, Feiyang Ye, Yilei Wang","doi":"10.1016/j.csl.2024.101730","DOIUrl":"10.1016/j.csl.2024.101730","url":null,"abstract":"<div><div>Although significant progress has been made in Chinese Named Entity Recognition (NER) methods based on deep learning, their performance often falls short in few-shot scenarios. Feature enhancement is considered a promising approach to address the issue of Chinese few-shot NER. However, traditional feature fusion methods tend to lead to the loss of important information and the integration of irrelevant information. Despite the benefits of incorporating BERT for improving entity recognition, its performance is limited when training data is insufficient. To tackle these challenges, this paper proposes a Feature Enhancement-based approach for Chinese Few-shot NER called FE-CFNER. FE-CFNER designs a double cross neural network to minimize information loss through the interaction of feature cross twice. Additionally, adaptive weights and a top-<span><math><mi>k</mi></math></span> mechanism are introduced to sparsify attention distributions, enabling the model to prioritize important information related to entities while excluding irrelevant information. To further enhance the quality of BERT embeddings, FE-CFNER employs a contrastive template for contrastive learning pre-training of BERT, enhancing BERT’s semantic understanding capability. We evaluate the proposed method on four sampled Chinese NER datasets: Weibo, Resume, Taobao, and Youku. Experimental results validate the effectiveness and superiority of FE-CFNER in Chinese few-shot NER tasks.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101730"},"PeriodicalIF":3.1,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spoofing countermeasure for fake speech detection using brute force features","authors":"Arsalan Rahman Mirza , Abdulbasit K. Al-Talabani","doi":"10.1016/j.csl.2024.101732","DOIUrl":"10.1016/j.csl.2024.101732","url":null,"abstract":"<div><div>Due to the progress in deep learning technology, techniques that generate spoofed speech have significantly emerged. Such synthetic speech can be exploited for harmful purposes, like impersonation or disseminating false information. Researchers in the area investigate the useful features for spoof detection. This paper extensively investigates three problems in spoof detection in speech, namely, the imbalanced sample per class, which may negatively affect the performance of any detection models, the effect of the feature early and late fusion, and the analysis of unseen attacks on the model. Regarding the imbalanced issue, we have proposed two approaches (a Synthetic Minority Over Sampling Technique (SMOTE)-based and a Bootstrap-based model). We have used the OpenSMILE toolkit, to extract different feature sets, their results and early and late fusion of them have been investigated. The experiments are evaluated using the ASVspoof 2019 datasets which encompass synthetic, voice-conversion, and replayed speech samples. Additionally, Support Vector Machine (SVM) and Deep Neural Network (DNN) have been adopted in the classification. The outcomes from various test scenarios indicated that neither the imbalanced nature of the dataset nor a specific feature or their fusions outperformed the brute force version of the model as the best Equal Error Rate (EER) achieved by the Imbalance model is 6.67 % and 1.80 % for both Logical Access (LA) and Physical Access (PA) respectively.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101732"},"PeriodicalIF":3.1,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louis Mahon , Omri Abend , Uri Berger , Katherine Demuth , Mark Johnson , Mark Steedman
{"title":"A language-agnostic model of child language acquisition","authors":"Louis Mahon , Omri Abend , Uri Berger , Katherine Demuth , Mark Johnson , Mark Steedman","doi":"10.1016/j.csl.2024.101714","DOIUrl":"10.1016/j.csl.2024.101714","url":null,"abstract":"<div><div>This work reimplements a recent semantic bootstrapping child language acquisition (CLA) model, which was originally designed for English, and trains it to learn a new language: Hebrew. The model learns from pairs of utterances and logical forms as meaning representations, and acquires both syntax and word meanings simultaneously. The results show that the model mostly transfers to Hebrew, but that a number of factors, including the richer morphology in Hebrew, makes the learning slower and less robust. This suggests that a clear direction for future work is to enable the model to leverage the similarities between different word forms.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101714"},"PeriodicalIF":3.1,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evidence and Axial Attention Guided Document-level Relation Extraction","authors":"Jiawei Yuan , Hongyong Leng , Yurong Qian , Jiaying Chen , Mengnan Ma , Shuxiang Hou","doi":"10.1016/j.csl.2024.101728","DOIUrl":"10.1016/j.csl.2024.101728","url":null,"abstract":"<div><div>Document-level Relation Extraction (DocRE) aims to identify semantic relations among multiple entity pairs within a document. Most of the previous DocRE methods take the entire document as input. However, for human annotators, a small subset of sentences in the document, namely the evidence, is sufficient to infer the relation of an entity pair. Additionally, a document usually contains multiple entities, and these entities are scattered throughout various location of the document. Previous models use these entities independently, ignore the global interdependency among relation triples. To handle above issues, we propose a novel framework EAAGRE (Evidence and Axial Attention Guided Relation Extraction). Firstly, we use human-annotated evidence labels to supervise the attention module of DocRE system, making the model pay attention to the evidence sentences rather than others. Secondly, we construct an entity-level relation matrix and use axial attention to capture the global interactions among entity pairs. By doing so, we further extract the relations that require multiple entity pairs for prediction. We conduct various experiments on DocRED and have some improvement compared to baseline models, verifying the effectiveness of our model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101728"},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPNet: A Serial and Parallel Convolutional Neural Network algorithm for the cross-language coreference resolution","authors":"Zixi Jia , Tianli Zhao , Jingyu Ru , Yanxiang Meng , Bing Xia","doi":"10.1016/j.csl.2024.101729","DOIUrl":"10.1016/j.csl.2024.101729","url":null,"abstract":"<div><div>Current models of coreference resolution always neglect the importance of hidden feature extraction, accurate scoring framework design, and the long-term influence of preceding potential antecedents on future decision-making. However, these aspects play vital roles in scoring the likelihood of coreference between an anaphora and its’ real antecedent. In this paper, we present a novel model named Serial and Parallel Convolutional Neural Network (SPNet). Based on the SPNet, two kinds of resolvers are proposed. Given the characteristics of reinforcement learning, we joint the reinforcement learning framework and the SPNet to solve the problem of Chinese zero pronoun resolution. What is more, we make some fine-tuning on the SPNet and propose a new resolver combined with the end-to-end framework to solve the problem of coreference resolution. The experiments are conducted on the CoNLL-2012 dataset and the results show that our model is effective. Our model achieves excellent performance in the Chinese zero pronoun resolution task. On the other hand, compared with our baseline, our model also has an improvement of 0.3% in coreference resolution task.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"91 ","pages":"Article 101729"},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi
{"title":"Speech Generation for Indigenous Language Education","authors":"Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi","doi":"10.1016/j.csl.2024.101723","DOIUrl":"10.1016/j.csl.2024.101723","url":null,"abstract":"<div><div>As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101723"},"PeriodicalIF":3.1,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}