A dataset of annotated free comments on the sensory perception of madeleines for benchmarking text mining techniques

IF 1 Q3 MULTIDISCIPLINARY SCIENCES
Michel Visalli , Ronan Symoneaux , Cécile Mursic , Margaux Touret , Flore Lourtioux , Kipédène Coulibaly , Benjamin Mahieu
{"title":"A dataset of annotated free comments on the sensory perception of madeleines for benchmarking text mining techniques","authors":"Michel Visalli ,&nbsp;Ronan Symoneaux ,&nbsp;Cécile Mursic ,&nbsp;Margaux Touret ,&nbsp;Flore Lourtioux ,&nbsp;Kipédène Coulibaly ,&nbsp;Benjamin Mahieu","doi":"10.1016/j.dib.2024.111250","DOIUrl":null,"url":null,"abstract":"<div><div>This dataset was created to investigate the impact of data collection modes and pre-processing techniques on the quality of free comment data related to consumers' sensory perceptions. A total of 200 consumers were recruited and divided into two groups of 100. Each group evaluated six madeleine samples (five distinct samples and one replicate) in a sensory analysis laboratory, using different free comment data collection modes. Consumers in the first group provided only words or short expressions, while those in the second group used complete sentences. Additionally, participants reported their liking for each sample.</div><div>The collected data provided valuable insights into the effectiveness of the free comment method in sensory evaluation of food products. They emphasized the importance of data pre-processing and demonstrated how the chosen techniques can impact the quality of the results. The dataset is based on real-world consumer data, showcasing how individuals naturally express their subjective perceptions. It features descriptions that reflect authentic consumer language, including informal expressions, incorrect phrasing, spelling errors, and unstructured sentences. This raw textual data has been annotated and translated into English. The dataset can therefore be repurposed to assess and compare the performance of different text mining, natural language processing and sentiment analysis algorithms in both French and English, as well as to drive innovations in AI tools for sensory and consumer research.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111250"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742558/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924012125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

This dataset was created to investigate the impact of data collection modes and pre-processing techniques on the quality of free comment data related to consumers' sensory perceptions. A total of 200 consumers were recruited and divided into two groups of 100. Each group evaluated six madeleine samples (five distinct samples and one replicate) in a sensory analysis laboratory, using different free comment data collection modes. Consumers in the first group provided only words or short expressions, while those in the second group used complete sentences. Additionally, participants reported their liking for each sample.
The collected data provided valuable insights into the effectiveness of the free comment method in sensory evaluation of food products. They emphasized the importance of data pre-processing and demonstrated how the chosen techniques can impact the quality of the results. The dataset is based on real-world consumer data, showcasing how individuals naturally express their subjective perceptions. It features descriptions that reflect authentic consumer language, including informal expressions, incorrect phrasing, spelling errors, and unstructured sentences. This raw textual data has been annotated and translated into English. The dataset can therefore be repurposed to assess and compare the performance of different text mining, natural language processing and sentiment analysis algorithms in both French and English, as well as to drive innovations in AI tools for sensory and consumer research.
一个关于madeleine感官知觉的注释免费评论数据集,用于基准文本挖掘技术。
该数据集的创建是为了调查数据收集模式和预处理技术对与消费者感官知觉相关的免费评论数据质量的影响。总共招募了200名消费者,并将他们分为两组,每组100人。每个小组在感官分析实验室使用不同的免费评论数据收集模式评估了六个madeleine样本(五个不同的样本和一个重复)。第一组的消费者只提供单词或简短的表达,而第二组的消费者使用完整的句子。此外,参与者还报告了他们对每个样本的喜好。收集的数据为自由评论法在食品感官评价中的有效性提供了有价值的见解。他们强调了数据预处理的重要性,并演示了所选择的技术如何影响结果的质量。该数据集基于真实世界的消费者数据,展示了个人如何自然地表达他们的主观看法。它的特点是描述反映真实的消费者语言,包括非正式表达、不正确的措辞、拼写错误和非结构化的句子。这些原始文本数据已被注释并翻译成英文。因此,数据集可以重新用于评估和比较法语和英语的不同文本挖掘、自然语言处理和情感分析算法的性能,以及推动用于感官和消费者研究的人工智能工具的创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data in Brief
Data in Brief MULTIDISCIPLINARY SCIENCES-
CiteScore
3.10
自引率
0.00%
发文量
996
审稿时长
70 days
期刊介绍: Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信