Statistical Data Analysis using GPT3: An Overview

2022 IEEE Bombay Section Signature Conference (IBSSC) Pub Date : 2022-12-08 DOI:10.1109/IBSSC56953.2022.10037383

Ashwin Sharma, Disha Devalia, Wilfred Almeida, Harshali P. Patil, A. Mishra

{"title":"Statistical Data Analysis using GPT3: An Overview","authors":"Ashwin Sharma, Disha Devalia, Wilfred Almeida, Harshali P. Patil, A. Mishra","doi":"10.1109/IBSSC56953.2022.10037383","DOIUrl":null,"url":null,"abstract":"Though automated statistics has started gaining some momentum in the field of data analysis, it is not unified and very slow with large datasets. Due to computing limitations or lack of specific domain knowledge, general statistics have been used most commonly. But now research advisors are attracted towards a machine learning-based approach for statistical analysis of Data Sets which may help bridge gaps between traditional approaches like correlation matrices, p-values, etc., and new models like GPT3. This paper proposes a novel approach for the analysis of large datasets which uses GPT3 to predict insights from calculated statistics of data. The research addresses the limitations of existing methods and proposes a novel framework to analyze large statistical data sets, which solves many computationally challenging problems in efficient ways. Our proposed method works on top of GPT3's features, where it learns to predict individual words from particular parts of the dataset you pass as prompts (cumulative sums/means etc.) enabling us to analyze extremely large datasets such as telecom churn or census data. A comparison of traditional methods, statistical analysis, and machine learning approaches with GPT3 will be made. Furthermore, a discussion on the pros and cons of using GPT3 for this research is also discussed from the point of view of performance, accuracy, and reliability concerns.","PeriodicalId":426897,"journal":{"name":"2022 IEEE Bombay Section Signature Conference (IBSSC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Bombay Section Signature Conference (IBSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBSSC56953.2022.10037383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Though automated statistics has started gaining some momentum in the field of data analysis, it is not unified and very slow with large datasets. Due to computing limitations or lack of specific domain knowledge, general statistics have been used most commonly. But now research advisors are attracted towards a machine learning-based approach for statistical analysis of Data Sets which may help bridge gaps between traditional approaches like correlation matrices, p-values, etc., and new models like GPT3. This paper proposes a novel approach for the analysis of large datasets which uses GPT3 to predict insights from calculated statistics of data. The research addresses the limitations of existing methods and proposes a novel framework to analyze large statistical data sets, which solves many computationally challenging problems in efficient ways. Our proposed method works on top of GPT3's features, where it learns to predict individual words from particular parts of the dataset you pass as prompts (cumulative sums/means etc.) enabling us to analyze extremely large datasets such as telecom churn or census data. A comparison of traditional methods, statistical analysis, and machine learning approaches with GPT3 will be made. Furthermore, a discussion on the pros and cons of using GPT3 for this research is also discussed from the point of view of performance, accuracy, and reliability concerns.

查看原文本刊更多论文

使用GPT3进行统计数据分析:概述

虽然自动化统计已经开始在数据分析领域获得一些动力，但它并不统一，并且在大型数据集上非常缓慢。由于计算的限制或缺乏特定的领域知识，一般统计是最常用的。但现在，研究顾问被一种基于机器学习的数据集统计分析方法所吸引，这种方法可能有助于弥合传统方法(如相关矩阵、p值等)与新模型(如GPT3)之间的差距。本文提出了一种分析大型数据集的新方法，该方法使用GPT3从数据的计算统计中预测见解。该研究解决了现有方法的局限性，并提出了一种新的框架来分析大型统计数据集，以有效的方式解决了许多具有计算挑战性的问题。我们提出的方法在GPT3的功能之上工作，它学习从数据集的特定部分预测单个单词，你作为提示传递(累积总和/平均值等)，使我们能够分析非常大的数据集，如电信流失或人口普查数据。将传统方法、统计分析和机器学习方法与GPT3进行比较。此外，还从性能、准确性和可靠性的角度讨论了使用GPT3进行本研究的利弊。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Bombay Section Signature Conference (IBSSC)

自引率

0.00%

发文量