Computer Speech and Language最新文献

筛选
英文 中文
Knowledge-aware audio-grounded generative slot filling for limited annotated data 针对有限注释数据的知识感知音频生成槽填充
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-08-05 DOI: 10.1016/j.csl.2024.101707
Guangzhi Sun , Chao Zhang , Ivan Vulić , Paweł Budzianowski , Philip C. Woodland
{"title":"Knowledge-aware audio-grounded generative slot filling for limited annotated data","authors":"Guangzhi Sun ,&nbsp;Chao Zhang ,&nbsp;Ivan Vulić ,&nbsp;Paweł Budzianowski ,&nbsp;Philip C. Woodland","doi":"10.1016/j.csl.2024.101707","DOIUrl":"10.1016/j.csl.2024.101707","url":null,"abstract":"<div><p>Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (<em>e.g.</em> a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101707"},"PeriodicalIF":3.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000901/pdfft?md5=f629f96f3e24fa1b58c6bf9d7f53386f&pid=1-s2.0-S0885230824000901-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech self-supervised representations benchmarking: A case for larger probing heads 语音自监督表征基准:更大探测头的案例
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-08-03 DOI: 10.1016/j.csl.2024.101695
Salah Zaiem , Youcef Kemiche , Titouan Parcollet , Slim Essid , Mirco Ravanelli
{"title":"Speech self-supervised representations benchmarking: A case for larger probing heads","authors":"Salah Zaiem ,&nbsp;Youcef Kemiche ,&nbsp;Titouan Parcollet ,&nbsp;Slim Essid ,&nbsp;Mirco Ravanelli","doi":"10.1016/j.csl.2024.101695","DOIUrl":"10.1016/j.csl.2024.101695","url":null,"abstract":"<div><p>Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101695"},"PeriodicalIF":3.1,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000780/pdfft?md5=2b21a1caf20c9b6cfe8c476d74149c9f&pid=1-s2.0-S0885230824000780-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved relation extraction through key phrase identification using community detection on dependency trees 利用依存树上的社群检测,通过关键短语识别改进关系提取
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-08-02 DOI: 10.1016/j.csl.2024.101706
Shuang Liu , Xunqin Chen , Jiana Meng , Niko Lukač
{"title":"Improved relation extraction through key phrase identification using community detection on dependency trees","authors":"Shuang Liu ,&nbsp;Xunqin Chen ,&nbsp;Jiana Meng ,&nbsp;Niko Lukač","doi":"10.1016/j.csl.2024.101706","DOIUrl":"10.1016/j.csl.2024.101706","url":null,"abstract":"<div><p>A method for extracting relations from sentences by utilizing their dependency trees to identify key phrases is presented in this paper. Dependency trees are commonly used in natural language processing to represent the grammatical structure of a sentence, and this approach builds upon this representation to extract meaningful relations between phrases. Identifying key phrases is crucial in relation extraction as they often indicate the entities and actions involved in a relation. The method uses community detection algorithms on the dependency tree to identify groups of related words that form key phrases, such as subject-verb-object structures. The experiments on the Semeval-2010 task8 dataset and the TACRED dataset demonstrate that the proposed method outperforms existing baseline methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101706"},"PeriodicalIF":3.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000895/pdfft?md5=b0ec7e5572384747887044b09fab856d&pid=1-s2.0-S0885230824000895-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data 评估语言模型在对话数据情感分析中的任务和语言转换能力
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-31 DOI: 10.1016/j.csl.2024.101704
Vlad-Andrei Negru, Vasile Suciu, Alex-Mihai Lăpuşan, Camelia Lemnaru, Mihaela Dînşoreanu, Rodica Potolea
{"title":"Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data","authors":"Vlad-Andrei Negru,&nbsp;Vasile Suciu,&nbsp;Alex-Mihai Lăpuşan,&nbsp;Camelia Lemnaru,&nbsp;Mihaela Dînşoreanu,&nbsp;Rodica Potolea","doi":"10.1016/j.csl.2024.101704","DOIUrl":"10.1016/j.csl.2024.101704","url":null,"abstract":"<div><p>Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101704"},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000871/pdfft?md5=a2ab3e37131135c69cec0ed9bbef500a&pid=1-s2.0-S0885230824000871-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COfEE: A comprehensive ontology for event extraction from text COfEE:从文本中提取事件的综合本体论
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-31 DOI: 10.1016/j.csl.2024.101702
Ali Balali, Masoud Asadpour, Seyed Hossein Jafari
{"title":"COfEE: A comprehensive ontology for event extraction from text","authors":"Ali Balali,&nbsp;Masoud Asadpour,&nbsp;Seyed Hossein Jafari","doi":"10.1016/j.csl.2024.101702","DOIUrl":"10.1016/j.csl.2024.101702","url":null,"abstract":"<div><p>Large volumes of data are constantly being published on the web; however, the majority of this data is often unstructured, making it difficult to comprehend and interpret. To extract meaningful and structured information from such data, researchers and practitioners have turned to Information Extraction (IE) methods. One of the most challenging IE tasks is Event Extraction (EE), which involves extracting information related to specific incidents and their associated actors from text. EE has broad applications, including building a knowledge base, information retrieval, summarization, and online monitoring systems. Over the past few decades, various event ontologies, such as ACE, CAMEO, and ICEWS, have been developed to define event forms, actors, and dimensions of events observed in text. However, these ontologies have some limitations, such as covering only a few topics like political events, having inflexible structures in defining argument roles, lacking analytical dimensions, and insufficient gold-standard data. To address these concerns, we propose a new event ontology, COfEE, which integrates expert domain knowledge, previous ontologies, and a data-driven approach for identifying events from text. COfEE comprises two hierarchy levels (event types and event sub-types) that include new categories related to environmental issues, cyberspace, criminal activity, and natural disasters that require real-time monitoring. In addition, dynamic roles are defined for each event sub-type to capture various dimensions of events. The proposed ontology is evaluated on Wikipedia events, and it is shown to be comprehensive and general. Furthermore, to facilitate the preparation of gold-standard data for event extraction, we present a language-independent online tool based on COfEE. A gold-standard dataset annotated by ten human experts consisting of 24,000 news articles in Persian according to the COfEE ontology is also prepared. To diversify the data, news articles from the Wikipedia event portal and the 100 most popular Persian news agencies between 2008 and 2021 were collected. Finally, we introduce a supervised method based on deep learning techniques to automatically extract relevant events and their corresponding actors.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101702"},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000858/pdfft?md5=edd34515a4d99328a0c8d35808aa0fe2&pid=1-s2.0-S0885230824000858-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conversations in the wild: Data collection, automatic generation and evaluation 野外对话数据收集、自动生成和评估
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-30 DOI: 10.1016/j.csl.2024.101699
Nimra Zaheer , Agha Ali Raza , Mudassir Shabbir
{"title":"Conversations in the wild: Data collection, automatic generation and evaluation","authors":"Nimra Zaheer ,&nbsp;Agha Ali Raza ,&nbsp;Mudassir Shabbir","doi":"10.1016/j.csl.2024.101699","DOIUrl":"10.1016/j.csl.2024.101699","url":null,"abstract":"<div><p>The aim of conversational speech processing is to analyze human conversations in natural settings. It finds numerous applications in personality traits identification, speech therapy, speaker identification and verification, speech emotion detection, and speaker diarization. However, large-scale annotated datasets required for feature extraction and conversational model training only exist for a handful of languages (e.g. English, Mandarin, and French) as the gathering, cleaning, and annotation of such datasets is tedious, time-consuming, and expensive. We propose two scalable, language-agnostic algorithms for automatically generating multi-speaker, variable-length, spontaneous conversations. These algorithms synthesize conversations using existing non-conversational speech datasets. We also contribute the resulting datasets (283 hours, 50 speakers). As a comparison, we also gathered the first spontaneous conversational dataset for Urdu (24 hours, 212 speakers) from public talk shows. Using speaker diarization as an example, we evaluate our datasets and report the first baseline diarization error rates (DER) for Urdu (25% for synthetic dataset-based models, and 29% for natural conversations). Our conversational speech generation technique allows training speaker diarization pipelines without the need for preparing huge conversational repositories.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101699"},"PeriodicalIF":3.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000822/pdfft?md5=3c965afd5ed1a80b86a1318a77699ef7&pid=1-s2.0-S0885230824000822-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompting large language models for user simulation in task-oriented dialogue systems 提示大型语言模型,用于面向任务的对话系统中的用户模拟
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-26 DOI: 10.1016/j.csl.2024.101697
Atheer Algherairy , Moataz Ahmed
{"title":"Prompting large language models for user simulation in task-oriented dialogue systems","authors":"Atheer Algherairy ,&nbsp;Moataz Ahmed","doi":"10.1016/j.csl.2024.101697","DOIUrl":"10.1016/j.csl.2024.101697","url":null,"abstract":"<div><p>Large Language Models (LLMs) have gained widespread popularity due to their instruction-following abilities. In this study, we evaluate their ability in simulating user interactions for task-oriented dialogue (TOD) systems. Our findings demonstrate that prompting LLMs reveals their promising capabilities for training and testing dialogue policies, eliminating the need for domain expertise in crafting complex rules or relying on large annotated data, as required by traditional simulators. The results show that the dialogue system trained with the ChatGPT simulator achieves a success rate of 59%, comparable to a 62% success rate of the dialogue system trained with the manual-rules, agenda-based user simulator (ABUS). Furthermore, the dialogue system trained with the ChatGPT simulator demonstrates better generalization ability compared to the dialogue system trained with the ABUS. Its success rate outperforms that of the dialogue system trained with the ABUS by 4% on GenTUS, 5% on the ChatGPT Simulator, and 3% on the Llama simulator. Nevertheless, LLM-based user simulators provide challenging environment, lexically rich, diverse, and random responses. Llama simulator outperforms the human reference in all lexical diversity metrics with a margin of 0.66 in SE, 0.39 in CE, 0.01 in MSTTR, 0.04 in HDD, and 0.55 in MTLD, while the ChatGPT simulator achieves comparable results. This ultimately contributes to enhancing the system’s ability to generalize more effectively.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101697"},"PeriodicalIF":3.1,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000809/pdfft?md5=81b644a0e6ced84bc9ba93092c2f49b3&pid=1-s2.0-S0885230824000809-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demystifying large language models in second language development research 解密第二语言发展研究中的大型语言模型
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-26 DOI: 10.1016/j.csl.2024.101700
Yan Cong
{"title":"Demystifying large language models in second language development research","authors":"Yan Cong","doi":"10.1016/j.csl.2024.101700","DOIUrl":"10.1016/j.csl.2024.101700","url":null,"abstract":"<div><p>Evaluating students' textual response is a common and critical task in language research and education practice. However, manual assessment can be tedious and may lack consistency, posing challenges for both scientific discovery and frontline teaching. Leveraging state-of-the-art large language models (LLMs), we aim to define and operationalize LLM-Surprisal, a numeric representation of the interplay between lexical diversity and syntactic complexity, and to empirically and theoretically demonstrate its relevance for automatic writing assessment and Chinese L2 (second language) learners’ English writing development. We developed an LLM-based natural language processing pipeline that can automatically compute text Surprisal scores. By comparing Surprisal metrics with the widely used classic indices in L2 studies, we extended the usage of computational metrics in Chinese learners’ L2 English writing. Our analyses suggested that LLM-Surprisals can distinguish L2 from L1 (first language) writing, index L2 development stages, and predict scores provided by human professionals. This indicated that the Surprisal dimension may manifest itself as critical aspects in L2 development. The relative advantages and disadvantages of these approaches were discussed in depth. We concluded that LLMs are promising tools that can enhance L2 research. Our showcase paves the way for more nuanced approaches to computationally assessing and understanding L2 development. Our pipelines and findings will inspire language teachers, learners, and researchers to operationalize LLMs in an innovative and accessible manner.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101700"},"PeriodicalIF":3.1,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000834/pdfft?md5=88083b1a8544dcbd7f01cce3a7d527d7&pid=1-s2.0-S0885230824000834-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141843458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of preference elicitation methods on the user experience in conversational recommender systems 偏好激发方法对对话式推荐系统用户体验的影响
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-25 DOI: 10.1016/j.csl.2024.101696
Liv Ziegfeld , Daan Di Scala , Anita H.M. Cremers
{"title":"The effect of preference elicitation methods on the user experience in conversational recommender systems","authors":"Liv Ziegfeld ,&nbsp;Daan Di Scala ,&nbsp;Anita H.M. Cremers","doi":"10.1016/j.csl.2024.101696","DOIUrl":"10.1016/j.csl.2024.101696","url":null,"abstract":"<div><p>The prevalence of conversational interfaces is rapidly rising, since improved algorithms allow for remarkable proficiency in understanding and generating natural language. This also holds for Conversational Recommender Systems (CRS), that benefit from information being provided by the user in the course of the dialogue to offer personalized recommendations. However, the challenge remains eliciting the user's characteristics and preferences in a way that leads to the most optimal user experience. Hence, the current research was aimed at investigating the effect of different Preference Elicitation (PE) methods on the user experience of a CRS. We introduce two axes across which PE methods can be classified, namely the degree of system prompt guidance and the level of user input restriction. We built three versions of a CRS to conduct a between-subjects experiment which compared three conditions: high guidance-high restriction, high guidance-low restriction and low guidance-low restriction. We tested their effect on ten constructs of user experience measures on 66 European participants, all working in agriculture or forestry.</p><p>The study did not find any significant effects of the three preference elicitation methods on all user experience constructs collected through questionnaires. However, we did find significant differences in terms of the objective measures chat duration (Speed), response time (Cognitive Demand) and recommendation performance (Accuracy of Recommended Items). Regarding the recommendation performance, it was found that the preference elicitation methods with high guidance led to a higher match score than the condition with low guidance. The certainty score was highest in the condition with high guidance and high input restriction. Finally, we found through a question at the end of the conversation that users who were satisfied with the recommendation responded more positively to six out of ten user experience constructs. This suggests that satisfaction with the recommendation performance is a crucial factor in the user experience of CRSs.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101696"},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000792/pdfft?md5=2468411a22f6c0a2ba9f84281b96dacc&pid=1-s2.0-S0885230824000792-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Theory of mind performance of large language models: A comparative analysis of Turkish and English 大型语言模型的思维理论性能:土耳其语和英语的比较分析
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-25 DOI: 10.1016/j.csl.2024.101698
Burcu Ünlütabak, Onur Bal
{"title":"Theory of mind performance of large language models: A comparative analysis of Turkish and English","authors":"Burcu Ünlütabak,&nbsp;Onur Bal","doi":"10.1016/j.csl.2024.101698","DOIUrl":"10.1016/j.csl.2024.101698","url":null,"abstract":"<div><p>Theory of mind (ToM), understanding others’ mental states, is a defining skill belonging to humans. Research assessing LLMs’ ToM performance yields conflicting findings and leads to discussions about whether and how they could show ToM understanding. Psychological research indicates that the characteristics of a specific language can influence how mental states are represented and communicated. Thus, it is reasonable to expect language characteristics to influence how LLMs communicate with humans, especially when the conversation involves references to mental states. This study examines how these characteristics affect LLMs’ ToM performance by evaluating GPT 3.5 and 4 performances in English and Turkish. Turkish provides an excellent contrast to English since Turkish has a different syntactic structure and special verbs, san- and zannet-, meaning “falsely believe.” Using Open AI's Chat Completion API, we collected responses from GPT models for first- and second-order ToM scenarios in English and Turkish. Our innovative approach combined completion prompts and open-ended questions within the same chat session, offering deep insights into models’ reasoning processes. Our data showed that while GPT models can respond accurately to standard ToM tasks (100% accuracy), their performance deteriorates (below chance level) with slight modifications. This high sensitivity suggests a lack of robustness in ToM performance. GPT 4 outperformed its predecessor, GPT 3.5, showing improvement in ToM performance to some extent. The models generally performed better when tasks were presented in English than in Turkish. These findings indicate that GPT models cannot reliably pass first-order and second-order ToM tasks in either of the languages yet. The findings have significant implications for <em>Explainability</em> of LLMs by highlighting challenges and biases that they face when simulating human-like ToM understanding in different languages.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101698"},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000810/pdfft?md5=e4a1b003e652ef2e0a652d3d4eaf2c3d&pid=1-s2.0-S0885230824000810-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信