ChatGPT-based biological and psychological data imputation

Anam Nazir, Muhammad Nadeem Cheeema, Ze Wang
{"title":"ChatGPT-based biological and psychological data imputation","authors":"Anam Nazir,&nbsp;Muhammad Nadeem Cheeema,&nbsp;Ze Wang","doi":"10.1016/j.metrad.2023.100034","DOIUrl":null,"url":null,"abstract":"<div><p>Missing data are a common problem for large cohort or longitudinal research and have been handled through data imputation. Based on simplified models such as linear or nonlinear interpolations, current imputation methods may not be accurate for real-life data such as biological and behavioral data. The purpose of this work was to explore the capability of ChatGPT, a powerful Large Language Model (LLM) developed by OpenAI, for biological and psychological data imputation. We tested the feasibility using data from the Human Connectome Project. Performance was evaluated by comparing the imputed data against known ground truth (GT) and measured with metrics like Pearson correlation coefficient (r), relative accuracy (MP), and mean absolute error (MAE). Comparative analyses with traditional imputation techniques are also conducted to demonstrate the superior efficacy of the ChatGPT as a data imputer. In summary, through customized data-to-text prompting engineering, ChatGPT can successfully capture intricate patterns and dependencies within biological data, resulting in precise imputations. Fine-tuning ChatGPT with domain-specific biological vocabulary with human in-loop as an interpreter enhances the accuracy and relevance of the imputations.</p></div>","PeriodicalId":100921,"journal":{"name":"Meta-Radiology","volume":"1 3","pages":"Article 100034"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950162823000346/pdfft?md5=acce895c3937994b83ab89acba27ca65&pid=1-s2.0-S2950162823000346-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta-Radiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950162823000346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Missing data are a common problem for large cohort or longitudinal research and have been handled through data imputation. Based on simplified models such as linear or nonlinear interpolations, current imputation methods may not be accurate for real-life data such as biological and behavioral data. The purpose of this work was to explore the capability of ChatGPT, a powerful Large Language Model (LLM) developed by OpenAI, for biological and psychological data imputation. We tested the feasibility using data from the Human Connectome Project. Performance was evaluated by comparing the imputed data against known ground truth (GT) and measured with metrics like Pearson correlation coefficient (r), relative accuracy (MP), and mean absolute error (MAE). Comparative analyses with traditional imputation techniques are also conducted to demonstrate the superior efficacy of the ChatGPT as a data imputer. In summary, through customized data-to-text prompting engineering, ChatGPT can successfully capture intricate patterns and dependencies within biological data, resulting in precise imputations. Fine-tuning ChatGPT with domain-specific biological vocabulary with human in-loop as an interpreter enhances the accuracy and relevance of the imputations.

Abstract Image

基于chatgpt的生物和心理数据输入
数据缺失是大型队列或纵向研究的常见问题,通常通过数据输入来解决。基于简化模型,如线性或非线性插值,目前的方法可能不准确的现实生活中的数据,如生物和行为数据。这项工作的目的是探索ChatGPT的能力,ChatGPT是OpenAI开发的一个强大的大型语言模型(LLM),用于生物和心理数据的输入。我们使用人类连接体项目的数据来测试这种方法的可行性。通过将输入的数据与已知的真实值(GT)进行比较来评估性能,并使用Pearson相关系数(r)、相对精度(MP)和平均绝对误差(MAE)等指标进行测量。与传统的数据输入技术进行了对比分析,证明了ChatGPT作为数据输入器的优越性。总之,通过定制的数据到文本提示工程,ChatGPT可以成功捕获生物数据中复杂的模式和依赖关系,从而实现精确的imputation。使用领域特定的生物词汇和人类在循环中作为解释器对ChatGPT进行微调,提高了输入的准确性和相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信