Bilingual Dialogue Dataset with Personality and Emotion Annotations for Personality Recognition in Education.

IF 5.8 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data Pub Date : 2025-03-27 DOI:10.1038/s41597-025-04836-w

Zhi Liu, Yao Xiao, Zhu Su, Luyao Ye, Kaili Lu, Xian Peng

{"title":"Bilingual Dialogue Dataset with Personality and Emotion Annotations for Personality Recognition in Education.","authors":"Zhi Liu, Yao Xiao, Zhu Su, Luyao Ye, Kaili Lu, Xian Peng","doi":"10.1038/s41597-025-04836-w","DOIUrl":null,"url":null,"abstract":"<p><p>Dialogue datasets are essential for advancing natural language processing (NLP) tasks. However, many existing datasets lack integrated annotations for personality and emotion, limiting models' ability to effectively capture these aspects and generate personalized, human-like dialogues, which ultimately impact user experience. To address this challenge, we construct bilingual dialogue datasets in Chinese and English, incorporating Big Five personality traits and emotion annotations. We utilize the AutoGen tool within a multi-agent framework to generate multi-turn question-answering dialogue datasets based on fables. By creating persona agents with diverse personalities, we effectively enhance the heterogeneity of personalities, overcoming previous limitations in personality diversity. Finally, we validate the utterance quality in the dataset and investigate the alignment between conversational utterances and speakers' personality traits. Moreover, by integrating emotional annotations for each utterance, This dataset offers significant potential for developing emotion-aware systems that automatically detect personality traits. It serves as a valuable resource for advancing emotionally intelligent dialogue systems and research in personality and affective computing.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"514"},"PeriodicalIF":5.8000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11950162/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-04836-w","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Dialogue datasets are essential for advancing natural language processing (NLP) tasks. However, many existing datasets lack integrated annotations for personality and emotion, limiting models' ability to effectively capture these aspects and generate personalized, human-like dialogues, which ultimately impact user experience. To address this challenge, we construct bilingual dialogue datasets in Chinese and English, incorporating Big Five personality traits and emotion annotations. We utilize the AutoGen tool within a multi-agent framework to generate multi-turn question-answering dialogue datasets based on fables. By creating persona agents with diverse personalities, we effectively enhance the heterogeneity of personalities, overcoming previous limitations in personality diversity. Finally, we validate the utterance quality in the dataset and investigate the alignment between conversational utterances and speakers' personality traits. Moreover, by integrating emotional annotations for each utterance, This dataset offers significant potential for developing emotion-aware systems that automatically detect personality traits. It serves as a valuable resource for advancing emotionally intelligent dialogue systems and research in personality and affective computing.

查看原文本刊更多论文

对话数据集对于推进自然语言处理（NLP）任务至关重要。然而，许多现有的数据集缺乏对个性和情感的综合注释，从而限制了模型有效捕捉这些方面并生成个性化、类人对话的能力，最终影响了用户体验。为了应对这一挑战，我们构建了中英文双语对话数据集，并纳入了大五人格特质和情感注释。我们在多代理框架内利用 AutoGen 工具生成基于寓言故事的多轮问答对话数据集。通过创建具有不同性格的角色代理，我们有效地增强了性格的异质性，克服了以往性格多样性的局限性。最后，我们验证了数据集中的语句质量，并研究了对话语句与说话者个性特征之间的一致性。此外，通过整合每个语篇的情感注释，该数据集为开发自动检测个性特征的情感感知系统提供了巨大的潜力。它是推进情感智能对话系统以及人格和情感计算研究的宝贵资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Data Social Sciences-Education

CiteScore

11.20

自引率

4.10%

发文量

689

审稿时长

16 weeks

期刊介绍： Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.