Exploiting GPT for synthetic data generation: An empirical study

IF 7.8 1区 管理学 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE
Tony Busker , Sunil Choenni , Mortaza S. Bargh
{"title":"Exploiting GPT for synthetic data generation: An empirical study","authors":"Tony Busker ,&nbsp;Sunil Choenni ,&nbsp;Mortaza S. Bargh","doi":"10.1016/j.giq.2024.101988","DOIUrl":null,"url":null,"abstract":"<div><div>There are many good reasons to use synthetic data instead of real data for research purposes. These reasons may range from the business sensitiveness of real data to increased cost of collecting real data in accordance with GDPR requirements. In this paper, we elaborate upon the potentials of the Large Language Model GPT as a tool to generate synthetic data for analytical purposes when there is no real-data available or accessible. Primarily, we show that by varying the scope of probes adequately, we can generate data of different granularities. To show this, we generated stereotypical data with three levels of granularity by posing more than 18,500 probes to GPT. In total, we generated stereotypical data for eight different views, which can be categorized in three view types corresponding to the three levels of granularity. Secondarily, we show that by varying the scope of probes one can create meaningful information. To show this, we performed a so-called similarity analysis on the generated stereotypical data. We used data visualizations, e.g. heatmaps, to show the views and categories within the views that are similar and those that are at odd with each other. We elaborate upon the application areas of the insight gained about such similarities and differences. Furthermore, we discuss several other types of analysis that can be performed on the generated stereotypical data.</div></div>","PeriodicalId":48258,"journal":{"name":"Government Information Quarterly","volume":"42 1","pages":"Article 101988"},"PeriodicalIF":7.8000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Government Information Quarterly","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0740624X24000807","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

There are many good reasons to use synthetic data instead of real data for research purposes. These reasons may range from the business sensitiveness of real data to increased cost of collecting real data in accordance with GDPR requirements. In this paper, we elaborate upon the potentials of the Large Language Model GPT as a tool to generate synthetic data for analytical purposes when there is no real-data available or accessible. Primarily, we show that by varying the scope of probes adequately, we can generate data of different granularities. To show this, we generated stereotypical data with three levels of granularity by posing more than 18,500 probes to GPT. In total, we generated stereotypical data for eight different views, which can be categorized in three view types corresponding to the three levels of granularity. Secondarily, we show that by varying the scope of probes one can create meaningful information. To show this, we performed a so-called similarity analysis on the generated stereotypical data. We used data visualizations, e.g. heatmaps, to show the views and categories within the views that are similar and those that are at odd with each other. We elaborate upon the application areas of the insight gained about such similarities and differences. Furthermore, we discuss several other types of analysis that can be performed on the generated stereotypical data.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Government Information Quarterly
Government Information Quarterly INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
15.70
自引率
16.70%
发文量
106
期刊介绍: Government Information Quarterly (GIQ) delves into the convergence of policy, information technology, government, and the public. It explores the impact of policies on government information flows, the role of technology in innovative government services, and the dynamic between citizens and governing bodies in the digital age. GIQ serves as a premier journal, disseminating high-quality research and insights that bridge the realms of policy, information technology, government, and public engagement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信