ChatGPT-based biological and psychological data imputation

Meta-Radiology Pub Date : 2023-11-01 DOI:10.1016/j.metrad.2023.100034

Anam Nazir, Muhammad Nadeem Cheeema, Ze Wang

{"title":"ChatGPT-based biological and psychological data imputation","authors":"Anam Nazir, Muhammad Nadeem Cheeema, Ze Wang","doi":"10.1016/j.metrad.2023.100034","DOIUrl":null,"url":null,"abstract":"<div><p>Missing data are a common problem for large cohort or longitudinal research and have been handled through data imputation. Based on simplified models such as linear or nonlinear interpolations, current imputation methods may not be accurate for real-life data such as biological and behavioral data. The purpose of this work was to explore the capability of ChatGPT, a powerful Large Language Model (LLM) developed by OpenAI, for biological and psychological data imputation. We tested the feasibility using data from the Human Connectome Project. Performance was evaluated by comparing the imputed data against known ground truth (GT) and measured with metrics like Pearson correlation coefficient (r), relative accuracy (MP), and mean absolute error (MAE). Comparative analyses with traditional imputation techniques are also conducted to demonstrate the superior efficacy of the ChatGPT as a data imputer. In summary, through customized data-to-text prompting engineering, ChatGPT can successfully capture intricate patterns and dependencies within biological data, resulting in precise imputations. Fine-tuning ChatGPT with domain-specific biological vocabulary with human in-loop as an interpreter enhances the accuracy and relevance of the imputations.</p></div>","PeriodicalId":100921,"journal":{"name":"Meta-Radiology","volume":"1 3","pages":"Article 100034"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950162823000346/pdfft?md5=acce895c3937994b83ab89acba27ca65&pid=1-s2.0-S2950162823000346-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta-Radiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950162823000346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Missing data are a common problem for large cohort or longitudinal research and have been handled through data imputation. Based on simplified models such as linear or nonlinear interpolations, current imputation methods may not be accurate for real-life data such as biological and behavioral data. The purpose of this work was to explore the capability of ChatGPT, a powerful Large Language Model (LLM) developed by OpenAI, for biological and psychological data imputation. We tested the feasibility using data from the Human Connectome Project. Performance was evaluated by comparing the imputed data against known ground truth (GT) and measured with metrics like Pearson correlation coefficient (r), relative accuracy (MP), and mean absolute error (MAE). Comparative analyses with traditional imputation techniques are also conducted to demonstrate the superior efficacy of the ChatGPT as a data imputer. In summary, through customized data-to-text prompting engineering, ChatGPT can successfully capture intricate patterns and dependencies within biological data, resulting in precise imputations. Fine-tuning ChatGPT with domain-specific biological vocabulary with human in-loop as an interpreter enhances the accuracy and relevance of the imputations.

Abstract Image

查看原文本刊更多论文

基于chatgpt的生物和心理数据输入

数据缺失是大型队列或纵向研究的常见问题，通常通过数据输入来解决。基于简化模型，如线性或非线性插值，目前的方法可能不准确的现实生活中的数据，如生物和行为数据。这项工作的目的是探索ChatGPT的能力，ChatGPT是OpenAI开发的一个强大的大型语言模型(LLM)，用于生物和心理数据的输入。我们使用人类连接体项目的数据来测试这种方法的可行性。通过将输入的数据与已知的真实值(GT)进行比较来评估性能，并使用Pearson相关系数(r)、相对精度(MP)和平均绝对误差(MAE)等指标进行测量。与传统的数据输入技术进行了对比分析，证明了ChatGPT作为数据输入器的优越性。总之，通过定制的数据到文本提示工程，ChatGPT可以成功捕获生物数据中复杂的模式和依赖关系，从而实现精确的imputation。使用领域特定的生物词汇和人类在循环中作为解释器对ChatGPT进行微调，提高了输入的准确性和相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Meta-Radiology

自引率

0.00%

发文量