{"title":"Modeling correlated causal-effect structure with a hypergraph for document-level event causality identification","authors":"Wei Xiang , Cheng Liu , Bang Wang","doi":"10.1016/j.csl.2024.101752","DOIUrl":"10.1016/j.csl.2024.101752","url":null,"abstract":"<div><div>Document-level event causality identification (ECI) aims to detect causal relations in between event mentions in a document. Existing approaches for document-level event causality identification detect the causal relation for each pair of event mentions independently, while ignoring latent correlated cause–effect structure in a document, i.e., one cause (effect) with multiple effects (causes). We argue that identifying the causal relation of one event pair may facilitate the causality identification for other event pairs. In light of such considerations, we propose to model the correlated causal-effect structure by a hypergraph and jointly identify multiple causal relations with the same cause (effect). In particular, we propose an event-hypergraph neural encoding model, called EHNEM, for document-level event causality identification. In EHNEM, we first initialize event mentions’ embeddings via a pre-trained language model and obtain potential causal relation of each event pair via a multilayer perceptron. To capture causal correlations, we construct a hypergraph by integrating potential causal relations for the same event as a hyperedge. On the constructed event-hypergraph, we use a hypergraph convolutional network to learn the representation of each event node for final causality identification. Experiments on both EventStoryLine corpus and English-MECI corpus show that our EHNEM model significantly outperforms the state-of-the-art algorithms.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101752"},"PeriodicalIF":3.1,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Plant , Valerio Giuffrida , Dimitra Gkatzia
{"title":"You Are What You Write: Author re-identification privacy attacks in the era of pre-trained language models","authors":"Richard Plant , Valerio Giuffrida , Dimitra Gkatzia","doi":"10.1016/j.csl.2024.101746","DOIUrl":"10.1016/j.csl.2024.101746","url":null,"abstract":"<div><div>The widespread use of pre-trained language models has revolutionised knowledge transfer in natural language processing tasks. However, there is a concern regarding potential breaches of user trust due to the risk of re-identification attacks, where malicious users could extract Personally Identifiable Information (PII) from other datasets. To assess the extent of extractable personal information on popular pre-trained models, we conduct the first wide coverage evaluation and comparison of state-of-the-art privacy-preserving algorithms on a large multi-lingual dataset for sentiment analysis annotated with demographic information (including location, age, and gender). Our results suggest a link between model complexity, pre-training data volume, and the efficacy of privacy-preserving embeddings. We found that privacy-preserving methods demonstrate greater effectiveness when applied to larger and more complex models, with improvements exceeding <span><math><mrow><mo>></mo><mn>20</mn><mtext>%</mtext></mrow></math></span> over non-private baselines. Additionally, we observe that local differential privacy imposes serious performance penalties of <span><math><mrow><mo>≈</mo><mn>20</mn><mtext>%</mtext></mrow></math></span> in our test setting, which can be mitigated using hybrid or metric-DP techniques.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101746"},"PeriodicalIF":3.1,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"End-to-End Speech-to-Text Translation: A Survey","authors":"Nivedita Sethiya, Chandresh Kumar Maurya","doi":"10.1016/j.csl.2024.101751","DOIUrl":"10.1016/j.csl.2024.101751","url":null,"abstract":"<div><div>Speech-to-Text (ST) translation pertains to the task of converting speech signals in one language to text in another language. It finds its application in various domains, such as hands-free communication, dictation, video lecture transcription, and translation, to name a few. Automatic Speech Recognition (ASR), as well as Machine Translation(MT) models, play crucial roles in traditional ST translation, enabling the conversion of spoken language in its original form to written text and facilitating seamless cross-lingual communication. ASR recognizes spoken words, while MT translates the transcribed text into the target language. Such integrated models suffer from cascaded error propagation and high resource and training costs. As a result, researchers have been exploring end-to-end (E2E) models for ST translation. However, to our knowledge, there is no comprehensive review of existing works on E2E ST. The present survey, therefore, discusses the works in this direction. We have attempted to provide a comprehensive review of models employed, metrics, and datasets used for ST tasks, providing challenges and future research direction with new insights. We believe this review will be helpful to researchers working on various applications of ST models.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101751"},"PeriodicalIF":3.1,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nankai Lin , Hongbin Zhang , Menglan Shen , Yu Wang , Shengyi Jiang , Aimin Yang
{"title":"Corpus and unsupervised benchmark: Towards Tagalog grammatical error correction","authors":"Nankai Lin , Hongbin Zhang , Menglan Shen , Yu Wang , Shengyi Jiang , Aimin Yang","doi":"10.1016/j.csl.2024.101750","DOIUrl":"10.1016/j.csl.2024.101750","url":null,"abstract":"<div><div>Grammatical error correction (GEC) is a challenging task for natural language processing techniques. Many efforts to address GEC have been made for high-resource languages such as English or Chinese. However, limited work has been done for low-resource languages because of the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring performs well. However, the pre-trained language model is still to be explored in this context. This study proposes a BERT-based unsupervised GEC framework that primarily addresses word-level errors, where GEC is viewed as a multi-class classification task. The framework contains three modules: a data flow construction module, a sentence perplexity scoring module, and an error detecting and correcting module. We propose a novel scoring method for pseudo-perplexity to evaluate a sentence’s probable correctness and construct a Tagalog corpus for Tagalog GEC research. It obtains competitive performance on the self-constructed Tagalog corpus and the open-source Indonesian corpus, and it demonstrates that our framework is complementary to the baseline methods for low-resource GEC tasks. Our corpus can be obtained from <span><span>https://github.com/GKLMIP/TagalogGEC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"91 ","pages":"Article 101750"},"PeriodicalIF":3.1,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongle Kong , Zhihao Yang , Zeyuan Ding , Wenfei Liu , Shiqi Zhang , Jianan Xu , Hongfei Lin
{"title":"TR-Net: Token Relation Inspired Table Filling Network for Joint Entity and Relation Extraction","authors":"Yongle Kong , Zhihao Yang , Zeyuan Ding , Wenfei Liu , Shiqi Zhang , Jianan Xu , Hongfei Lin","doi":"10.1016/j.csl.2024.101749","DOIUrl":"10.1016/j.csl.2024.101749","url":null,"abstract":"<div><div>Recently, table filling models have achieved promising performance in jointly extracting relation triplets from complex sentences, leveraging their inherent structural advantage of delineating entities and relations as table cells. Nonetheless, these models predominantly concentrate on the cells corresponding to entity pairs within the predicted tables, neglecting the interrelations among other token pairs. This oversight can potentially lead to the exclusion of essential token information. To address these challenges, we introduce the <em>Token Relation-Inspired Network (TR-Net)</em>, a novel framework for the joint extraction of entities and relations. It encompasses a token relation generator that adaptively constructs a token relation table, concentrating on the prominent token cells. Moreover, it also uses a structure-enhanced encoder that integrates the structural and sequential data of sentences via a highway gate mechanism. Our experimental analysis demonstrates that TR-Net delivers considerable enhancements and achieves state-of-the-art performance on four public datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101749"},"PeriodicalIF":3.1,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivier Perrotin , Brooke Stephenson , Silvain Gerber , Gérard Bailly , Simon King
{"title":"Refining the evaluation of speech synthesis: A summary of the Blizzard Challenge 2023","authors":"Olivier Perrotin , Brooke Stephenson , Silvain Gerber , Gérard Bailly , Simon King","doi":"10.1016/j.csl.2024.101747","DOIUrl":"10.1016/j.csl.2024.101747","url":null,"abstract":"<div><div>The Blizzard Challenge has benchmarked progress in Text-to-Speech (TTS) since 2005. The Challenge has seen important milestones passed, with results suggesting that synthetic speech was indistinguishable from natural speech in terms of intelligibility in 2021 and that by that same year it was perhaps even indistinguishable in naturalness. The high quality of synthetic speech generated by the latest TTS systems has thus revealed limitations with ITU-T P.800.1 Mean Opinion Score (MOS) in detecting the remaining differences between synthetic and natural speech. Yet, it was the only method used in previous Challenges and is still the most popular method in the field for speech synthesis evaluation. In the 2023 Challenge, we addressed observed limitations of past Challenges by incorporating state-of-the-art speech synthesis evaluation techniques to refine the evaluation of speech quality, speaker similarity and intelligibility. For speech quality, a relative comparison of the systems receiving the best MOS was able to discover a greater number of significant differences between systems. Regarding speaker similarity, we demonstrated that there is a strong bias depending on whether the listeners are familiar with the target voice or not. As for intelligibility, the evaluation of language-specific phenomena, such as the pronunciation of homographs, better highlighted system limits compared to global transcription tasks of synthesised utterances. In addition to reporting results for the 18 entries to the 2023 Challenge, we extend the results analysis to type of TTS module to provide some insights on the most recent advances in model design. Overall, this year’s results demonstrate the need for a shift towards new methods for refining TTS evaluation to shed light on increasingly smaller and localised differences between synthesised and natural speech.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101747"},"PeriodicalIF":3.1,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Wang , Dagang Li , Xuesi Hu , Yongmei Wang , Youhua Zhang
{"title":"CLIPMulti: Explore the performance of multimodal enhanced CLIP for zero-shot text classification","authors":"Peng Wang , Dagang Li , Xuesi Hu , Yongmei Wang , Youhua Zhang","doi":"10.1016/j.csl.2024.101748","DOIUrl":"10.1016/j.csl.2024.101748","url":null,"abstract":"<div><div>Zero-shot text classification does not require large amounts of labeled data and is designed to handle text classification tasks that lack annotated training data. Existing zero-shot text classification uses either a text–text matching paradigm or a text–image matching paradigm, which shows good performance on different benchmark datasets. However, the existing classification paradigms only consider a single modality for text matching, and little attention is paid to the help of multimodality for text classification. In order to incorporate multimodality into zero-shot text classification, we propose a multimodal enhanced CLIP framework (CLIPMulti), which employs a text–image&text matching paradigm to enhance the effectiveness of zero-shot text classification. Three different image and text combinations are tested for their effects on zero-shot text classification, and a matching method (Match-CLIPMulti) is further proposed to find the corresponding text based on the classified images automatically. We conducted experiments on seven publicly available zero-shot text classification datasets and achieved competitive performance. In addition, we analyzed the effect of different parameters on the Match-CLIPMulti experiments. We hope this work will bring more thoughts and explorations on multimodal fusion in language tasks.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101748"},"PeriodicalIF":3.1,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UniKDD: A Unified Generative model for Knowledge-driven Dialogue","authors":"Qian Wang , Yan Chen , Yang Wang , Xu Wang","doi":"10.1016/j.csl.2024.101740","DOIUrl":"10.1016/j.csl.2024.101740","url":null,"abstract":"<div><div>knowledge-driven dialogue (KDD) is to introduce an external knowledge base, generating an informative and fluent response. However, previous works employ different models to conduct the sub-tasks of KDD, ignoring the connection between sub-tasks and resulting in a difficulty of training and inference. To solve those issues above, we propose the UniKDD, a unified generative model for KDD, which models all sub-tasks into a generation task, enhancing the connection between tasks and facilitating the training and inference. Specifically, UniKDD simplifies the complex KDD tasks into three main sub-tasks, i.e., entity prediction, attribute prediction, and dialogue generation. These tasks are transformed into a text generation task and trained by an end-to-end way. In the inference phase, UniKDD first predicts a set of entities used for current turn dialogue according to the dialogue history. Then, for each predicted entity, UniKDD predicts the corresponding attributes by the dialogue history. Finally, UniKDD generates a high-quality and informative response using the dialogue history and predicted knowledge triplets. The experimental results show that our proposed UniKDD can perform KDD task well and outperform the baseline on the evaluation of knowledge selection and response generation. The code is available at <span><span>https://github.com/qianandfei/UniKDD.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101740"},"PeriodicalIF":3.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the ability of LLMs to classify written proficiency levels","authors":"Susanne DeVore","doi":"10.1016/j.csl.2024.101745","DOIUrl":"10.1016/j.csl.2024.101745","url":null,"abstract":"<div><div>This paper tests the ability of LLMs to classify language proficiency ratings of texts written by learners of English and Mandarin, taking a benchmarking research design approach. First, the impact of five variables (LLM model, prompt version, prompt language, grading scale, and temperature) on rating accuracy are tested using a basic instruction-only prompt. Second, the consistency of results is tested. Third, the top performing consistent conditions emerging from the first and second tests are used to test the impact of adding examples and/or proficiency guidelines and the use of zero-, one-, and few-shot chain-of-thought prompting techniques on accuracy rating. While performance does not meet levels necessary for real-world use cases, the results can inform ongoing development of LLMs and prompting techniques to improve accuracy. This paper highlights recent research on prompt engineering outside of the field of linguistics and selects prompt variables and techniques that are theoretically relevant to proficiency rating. Finally, it discusses key takeaways from these tests that can inform future development and why approaches that have been effective in other contexts were not as effective for proficiency rating.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101745"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entity and relationship extraction based on span contribution evaluation and focusing framework","authors":"Qibin Li , Nianmin Yao , Nai Zhou , Jian Zhao","doi":"10.1016/j.csl.2024.101744","DOIUrl":"10.1016/j.csl.2024.101744","url":null,"abstract":"<div><div>Entity and relationship extraction involves identifying named entities and extracting relationships between them. Existing research focuses on enhancing span representations, yet overlooks the impact of non-target spans(ie, the span is non-entity or the span pair has no relationship) on model training. In this work, we propose a span contribution evaluation and focusing framework named CEFF, which assigns a contribution score to each non-target span in a sentence through pre-training, which reflects the contribution of span to model performance improvement. To a certain extent, this method considers the impact of different spans on model training, making the training more targeted. Additionally, leveraging the contribution scores of non-target spans, we introduce a simplified variant of the model, termed CEFF<span><math><msub><mrow></mrow><mrow><mi>s</mi></mrow></msub></math></span>, which achieves comparable performance to models trained with all spans while utilizing fewer spans. This approach reduces training costs and improves training efficiency. Through extensive validation, we demonstrate that our contribution scores accurately reflect span contributions and achieve state-of-the-art results on five benchmark datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101744"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}