{"title":"UniKDD: A Unified Generative model for Knowledge-driven Dialogue","authors":"Qian Wang , Yan Chen , Yang Wang , Xu Wang","doi":"10.1016/j.csl.2024.101740","DOIUrl":"10.1016/j.csl.2024.101740","url":null,"abstract":"<div><div>knowledge-driven dialogue (KDD) is to introduce an external knowledge base, generating an informative and fluent response. However, previous works employ different models to conduct the sub-tasks of KDD, ignoring the connection between sub-tasks and resulting in a difficulty of training and inference. To solve those issues above, we propose the UniKDD, a unified generative model for KDD, which models all sub-tasks into a generation task, enhancing the connection between tasks and facilitating the training and inference. Specifically, UniKDD simplifies the complex KDD tasks into three main sub-tasks, i.e., entity prediction, attribute prediction, and dialogue generation. These tasks are transformed into a text generation task and trained by an end-to-end way. In the inference phase, UniKDD first predicts a set of entities used for current turn dialogue according to the dialogue history. Then, for each predicted entity, UniKDD predicts the corresponding attributes by the dialogue history. Finally, UniKDD generates a high-quality and informative response using the dialogue history and predicted knowledge triplets. The experimental results show that our proposed UniKDD can perform KDD task well and outperform the baseline on the evaluation of knowledge selection and response generation. The code is available at <span><span>https://github.com/qianandfei/UniKDD.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101740"},"PeriodicalIF":3.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the ability of LLMs to classify written proficiency levels","authors":"Susanne DeVore","doi":"10.1016/j.csl.2024.101745","DOIUrl":"10.1016/j.csl.2024.101745","url":null,"abstract":"<div><div>This paper tests the ability of LLMs to classify language proficiency ratings of texts written by learners of English and Mandarin, taking a benchmarking research design approach. First, the impact of five variables (LLM model, prompt version, prompt language, grading scale, and temperature) on rating accuracy are tested using a basic instruction-only prompt. Second, the consistency of results is tested. Third, the top performing consistent conditions emerging from the first and second tests are used to test the impact of adding examples and/or proficiency guidelines and the use of zero-, one-, and few-shot chain-of-thought prompting techniques on accuracy rating. While performance does not meet levels necessary for real-world use cases, the results can inform ongoing development of LLMs and prompting techniques to improve accuracy. This paper highlights recent research on prompt engineering outside of the field of linguistics and selects prompt variables and techniques that are theoretically relevant to proficiency rating. Finally, it discusses key takeaways from these tests that can inform future development and why approaches that have been effective in other contexts were not as effective for proficiency rating.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101745"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entity and relationship extraction based on span contribution evaluation and focusing framework","authors":"Qibin Li , Nianmin Yao , Nai Zhou , Jian Zhao","doi":"10.1016/j.csl.2024.101744","DOIUrl":"10.1016/j.csl.2024.101744","url":null,"abstract":"<div><div>Entity and relationship extraction involves identifying named entities and extracting relationships between them. Existing research focuses on enhancing span representations, yet overlooks the impact of non-target spans(ie, the span is non-entity or the span pair has no relationship) on model training. In this work, we propose a span contribution evaluation and focusing framework named CEFF, which assigns a contribution score to each non-target span in a sentence through pre-training, which reflects the contribution of span to model performance improvement. To a certain extent, this method considers the impact of different spans on model training, making the training more targeted. Additionally, leveraging the contribution scores of non-target spans, we introduce a simplified variant of the model, termed CEFF<span><math><msub><mrow></mrow><mrow><mi>s</mi></mrow></msub></math></span>, which achieves comparable performance to models trained with all spans while utilizing fewer spans. This approach reduces training costs and improves training efficiency. Through extensive validation, we demonstrate that our contribution scores accurately reflect span contributions and achieve state-of-the-art results on five benchmark datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101744"},"PeriodicalIF":3.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Taking relations as known conditions: A tagging based method for relational triple extraction","authors":"Guanqing Kong , Qi Lei","doi":"10.1016/j.csl.2024.101734","DOIUrl":"10.1016/j.csl.2024.101734","url":null,"abstract":"<div><div>Relational triple extraction refers to extracting entities and relations from natural texts, which is a crucial task in the construction of knowledge graph. Recently, tagging based methods have received increasing attention because of their simple and effective structural form. Among them, the two-step extraction method is easy to cause the problem of category imbalance. To address this issue, we propose a novel two-step extraction method, which first extracts subjects, generates a fixed-size embedding for each relation, and then regards these relations as known conditions to extract the objects directly with the identified subjects. In order to eliminate the influence of irrelevant relations when predicting objects, we use a relation-special attention mechanism and a gate unit to select appropriate relations. In addition, most current models do not account for two-way interaction between tasks, so we design a feature interactive network to achieve bidirectional interaction between subject and object extraction tasks and enhance their connection. Experimental results on NYT, WebNLG, NYT<span><math><msup><mrow></mrow><mrow><mo>⋆</mo></mrow></msup></math></span> and WebNLG<span><math><msup><mrow></mrow><mrow><mo>⋆</mo></mrow></msup></math></span> datasets show that our model is competitive among joint extraction models.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101734"},"PeriodicalIF":3.1,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julian Linke , Bernhard C. Geiger , Gernot Kubin , Barbara Schuppler
{"title":"What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures","authors":"Julian Linke , Bernhard C. Geiger , Gernot Kubin , Barbara Schuppler","doi":"10.1016/j.csl.2024.101738","DOIUrl":"10.1016/j.csl.2024.101738","url":null,"abstract":"<div><div>Highly performing speech recognition is important for more fluent human–machine interaction (e.g., dialogue systems). Modern ASR architectures achieve human-level recognition performance on read speech but still perform sub-par on conversational speech, which arguably is or, at least, will be instrumental for human–machine interaction. Understanding the factors behind this shortcoming of modern ASR systems may suggest directions for improving them. In this work, we compare the performances of HMM- vs. transformer-based ASR architectures on a corpus of Austrian German conversational speech. Specifically, we investigate how strongly utterance length, prosody, pronunciation, and utterance complexity as measured by perplexity affect different ASR architectures. Among other findings, we observe that single-word utterances – which are characteristic of conversational speech and constitute roughly 30% of the corpus – are recognized more accurately if their F0 contour is flat; for longer utterances, the effects of the F0 contour tend to be weaker. We further find that zero-shot systems require longer utterance lengths and are less robust to pronunciation variation, which indicates that pronunciation lexicons and fine-tuning on the respective corpus are essential ingredients for the successful recognition of conversational speech.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101738"},"PeriodicalIF":3.1,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farhan Dhanani, Muhammad Rafi, Muhammad Atif Tahir
{"title":"Tickling translations: Small but mighty open-sourced transformers bring English PUN-ny entities to life in French!","authors":"Farhan Dhanani, Muhammad Rafi, Muhammad Atif Tahir","doi":"10.1016/j.csl.2024.101739","DOIUrl":"10.1016/j.csl.2024.101739","url":null,"abstract":"<div><div>Recent advancements in transformer-based language models have demonstrated substantial progress in producing good translations. Despite these achievements, challenges persist in translating playful requests, especially when users intentionally introduce humor. Deciphering the hidden pun among such playful requests is one of the major difficulties for modern language models, which causes user dissatisfaction. This paper targets a specific niche of humor translation, which is the translation of English-named entities containing puns into French using small-scale open-sourced transformer models. The transformer architecture serves as a foundation for popular language models like chatGPT. It allows learning long-range contextual relationships within sequences. The main novelty of the paper is the proposed extractive question/answering (Q/A) styled technique based on the transformers to find relevant translations for the provided English nouns using the openly available parallel corpora. To evaluate the effectiveness of our method, we utilize a dataset provided by the JOKER CLEF automatic pun and humor translation 2022 team. The dataset contains single-word nouns from popular novels, anime, movies, and games, each containing a pun. The discussed methodology and experimental framework are adaptable and can be extended to any language pair for which an open, available parallel corpus exists. This flexibility underscores the broader applicability of our findings and suggests the potential for enhancing humor translation across various language combinations.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101739"},"PeriodicalIF":3.1,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining replay and LoRA for continual learning in natural language understanding","authors":"Zeinab Borhanifard, Heshaam Faili, Yadollah Yaghoobzadeh","doi":"10.1016/j.csl.2024.101737","DOIUrl":"10.1016/j.csl.2024.101737","url":null,"abstract":"<div><div>Large language models have significantly improved dialogue systems through enhanced capabilities in understanding queries and generating responses. Despite these enhancements, task-oriented dialogue systems- – which power many intelligent assistants – face challenges when adapting to new domains and applications. This challenge arises from a phenomenon known as catastrophic forgetting, where models forget previously acquired knowledge when learning new tasks. This paper addresses this issue through continual learning techniques to preserve previously learned knowledge while seamlessly integrating new tasks and domains. We propose <strong>E</strong>xperience <strong>R</strong>eplay <strong>I</strong>nformative-<strong>Lo</strong>w <strong>R</strong>ank <strong>A</strong>daptation or ERI-LoRA, a hybrid continual learning method for natural language understanding in dialogue systems that effectively combines the replay-based methods with parameter-efficient techniques. Our experiments on intent detection and slot-filling tasks demonstrate that ERI-LoRA significantly outperforms competitive baselines in continual learning. The results of our catastrophic forgetting experiments demonstrate that ERI-LoRA maintains robust memory stability in the model, demonstrating its effectiveness in mitigating these effects.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101737"},"PeriodicalIF":3.1,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing pipeline task-oriented dialogue systems using post-processing networks","authors":"Atsumoto Ohashi, Ryuichiro Higashinaka","doi":"10.1016/j.csl.2024.101742","DOIUrl":"10.1016/j.csl.2024.101742","url":null,"abstract":"<div><div>Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing the dialogue performance of a pipeline system that consists of modules implemented with arbitrary methods for dialogue. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating that each module be differentiable. Through dialogue simulations and human evaluations on two well-studied task-oriented dialogue datasets, CamRest676 and MultiWOZ, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules. In addition, a comprehensive analysis of the results of the MultiWOZ experiments reveals the patterns of post-processing by PPNs that contribute to the overall dialogue performance of the system.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101742"},"PeriodicalIF":3.1,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepak Kumar Jain , S. Neelakandan , Ankit Vidyarthi , Anand Mishra , Ahmed Alkhayyat
{"title":"A knowledge-Aware NLP-Driven conversational model to detect deceptive contents on social media posts","authors":"Deepak Kumar Jain , S. Neelakandan , Ankit Vidyarthi , Anand Mishra , Ahmed Alkhayyat","doi":"10.1016/j.csl.2024.101743","DOIUrl":"10.1016/j.csl.2024.101743","url":null,"abstract":"<div><div>The widespread dissemination of deceptive content on social media presents a substantial challenge to preserving authenticity and trust. The epidemic growth of false news is due to the greater use of social media to transmit news, rather than conventional mass media such as newspapers, magazines, radio, and television. Humans' incapacity to differentiate among true and false facts exposes fake news as a threat to logical truth, democracy, journalism, and government credibility. Using combination of advanced methodologies, Deep learning (DL) methods, and Natural Language Processing (NLP) approaches, researchers and technology developers attempt to make robust systems proficient in discerning the subtle nuances that betray deceptive intent. Analysing conversational linguistic patterns of misleading data, these techniques’ purpose to progress the resilience of social platforms against the spread of deceptive content, eventually contributing to an additional informed and trustworthy online platform. This paper proposed a Knowledge-Aware NLP-Driven AlBiruni Earth Radius Optimization Algorithm with Deep Learning Tool for Enhanced Deceptive Content Detection (BER-DLEDCD) algorithm on Social Media. The purpose of the BER-DLEDCD system is to identify and classify the existence of deceptive content utilizing NLP with optimal DL model. In the BER-DLEDCD technique, data pre-processing takes place to change the input data into compatible format. Furthermore, the BER-DLEDCD approach applies hybrid DL technique encompassing Convolutional Neural Network with Long Short-Term Memory (CNN-LSTM) methodology for deceptive content detection. Moreover, the BER approach has been deployed to boost hyperparameter choice of the CNN-LSTM technique which leads to enhanced detection performance. The simulation outcome of the BER-DLEDCD system has been examined employing benchmark database. The extensive outcomes stated the BER-DLEDCD system achieved excellent performance with the accuracy of 94 %, 94.83 % precision, 94.30 % F-score with other recent approaches.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101743"},"PeriodicalIF":3.1,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECDG-DST: A dialogue state tracking model based on efficient context and domain guidance for smart dialogue systems","authors":"Meng Zhu , Xiaolong Xu","doi":"10.1016/j.csl.2024.101741","DOIUrl":"10.1016/j.csl.2024.101741","url":null,"abstract":"<div><div>Dialogue state tracking (DST) is an important component of smart dialogue systems, with the goal of predicting the current dialogue state at conversation turn. However, most of the previous works had problems with storing a large amount of data and storing a large amount of noisy information when the conversation takes many turns. In addition, they also overlooked the effect of the domain in the task of dialogue state tracking. In this paper, we propose ECDG-DST <sup>1</sup> (A dialogue state tracking model based on efficient context and domain guidance) for smart dialogue systems, which preserves key information but retains less dialogue history, and masks the domain effectively in dialogue state tracking. Our model utilizes the efficient conversation context, the previous conversation state and the relationship between domains and slots to narrow the range of slots to be updated, and also limit the directions of values to reduce the generation of irrelevant words. The ECDG-DST model consists of four main components, including an encoder, a domain guide, an operation predictor, and a value generator. We conducted experiments on three popular task-oriented dialogue datasets, Wizard-of-Oz2.0, MultiWOZ2.0, and MultiWOZ2.1, and the empirical results demonstrate that ECDG-DST respectively improved joint goal accuracy by 0.45 % on Wizard-of-Oz2.0, 2.44 % on MultiWOZ2.0 and 2.05 % on MultiWOZ2.1 compared to the baselines. In addition, we analyzed the scope of the efficient context through experiments and validate the effectiveness of our proposed domain guide mechanism through ablation study.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101741"},"PeriodicalIF":3.1,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}