Bilingual Dialogue Dataset with Personality and Emotion Annotations for Personality Recognition in Education.

IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Zhi Liu, Yao Xiao, Zhu Su, Luyao Ye, Kaili Lu, Xian Peng
{"title":"Bilingual Dialogue Dataset with Personality and Emotion Annotations for Personality Recognition in Education.","authors":"Zhi Liu, Yao Xiao, Zhu Su, Luyao Ye, Kaili Lu, Xian Peng","doi":"10.1038/s41597-025-04836-w","DOIUrl":null,"url":null,"abstract":"<p><p>Dialogue datasets are essential for advancing natural language processing (NLP) tasks. However, many existing datasets lack integrated annotations for personality and emotion, limiting models' ability to effectively capture these aspects and generate personalized, human-like dialogues, which ultimately impact user experience. To address this challenge, we construct bilingual dialogue datasets in Chinese and English, incorporating Big Five personality traits and emotion annotations. We utilize the AutoGen tool within a multi-agent framework to generate multi-turn question-answering dialogue datasets based on fables. By creating persona agents with diverse personalities, we effectively enhance the heterogeneity of personalities, overcoming previous limitations in personality diversity. Finally, we validate the utterance quality in the dataset and investigate the alignment between conversational utterances and speakers' personality traits. Moreover, by integrating emotional annotations for each utterance, This dataset offers significant potential for developing emotion-aware systems that automatically detect personality traits. It serves as a valuable resource for advancing emotionally intelligent dialogue systems and research in personality and affective computing.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"514"},"PeriodicalIF":5.8000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11950162/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-04836-w","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Dialogue datasets are essential for advancing natural language processing (NLP) tasks. However, many existing datasets lack integrated annotations for personality and emotion, limiting models' ability to effectively capture these aspects and generate personalized, human-like dialogues, which ultimately impact user experience. To address this challenge, we construct bilingual dialogue datasets in Chinese and English, incorporating Big Five personality traits and emotion annotations. We utilize the AutoGen tool within a multi-agent framework to generate multi-turn question-answering dialogue datasets based on fables. By creating persona agents with diverse personalities, we effectively enhance the heterogeneity of personalities, overcoming previous limitations in personality diversity. Finally, we validate the utterance quality in the dataset and investigate the alignment between conversational utterances and speakers' personality traits. Moreover, by integrating emotional annotations for each utterance, This dataset offers significant potential for developing emotion-aware systems that automatically detect personality traits. It serves as a valuable resource for advancing emotionally intelligent dialogue systems and research in personality and affective computing.

对话数据集对于推进自然语言处理(NLP)任务至关重要。然而,许多现有的数据集缺乏对个性和情感的综合注释,从而限制了模型有效捕捉这些方面并生成个性化、类人对话的能力,最终影响了用户体验。为了应对这一挑战,我们构建了中英文双语对话数据集,并纳入了大五人格特质和情感注释。我们在多代理框架内利用 AutoGen 工具生成基于寓言故事的多轮问答对话数据集。通过创建具有不同性格的角色代理,我们有效地增强了性格的异质性,克服了以往性格多样性的局限性。最后,我们验证了数据集中的语句质量,并研究了对话语句与说话者个性特征之间的一致性。此外,通过整合每个语篇的情感注释,该数据集为开发自动检测个性特征的情感感知系统提供了巨大的潜力。它是推进情感智能对话系统以及人格和情感计算研究的宝贵资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Scientific Data
Scientific Data Social Sciences-Education
CiteScore
11.20
自引率
4.10%
发文量
689
审稿时长
16 weeks
期刊介绍: Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信