Matthew Barsalou, Pedro Manuel Saraiva, Roberto Henriques
{"title":"Exploring Exploratory Data Analysis: An Empirical Test of Run Chart Utility","authors":"Matthew Barsalou, Pedro Manuel Saraiva, Roberto Henriques","doi":"10.2478/mspe-2023-0050","DOIUrl":null,"url":null,"abstract":"Abstract This paper explores Exploratory Data Analysis (EDA). Graphical methods are used to gain insights in EDA and these insights can be useful for forming tentative hypotheses when performing a root cause analysis (RCA). The topic of EDA is well addressed in the literature; however, empirical studies of the efficacy of EDA are lacking. We therefore aim to evaluate EDA by comparing one group of students identifying salient features in a table against a second group of students attempting to identify salient features in the same data presented in the form of a run chart, and then extracting relevant conclusions from such a comparison. Two groups of students were randomly selected to receive data; either in the form of a table or a run chart. They were then tasked with visually identifying any data points that stood out as interesting. The number of correctly identified values and the time to find the values were both evaluated by a two-sample t-test to determine if there was a statistically significant difference. The participants with a graph found the correct values that stood out in the data much quicker than those that used a table. Those using the data in the form of a table too much longer and failed to identify values that stood out. However, those with a graph also had far more false positives. Much has been written on the topic of EDA in the literature; however, an empirical evaluation of this common methodology is lacking. This paper confirms with empirical evidence the effectiveness of EDA.","PeriodicalId":44097,"journal":{"name":"Management Systems in Production Engineering","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Management Systems in Production Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/mspe-2023-0050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract This paper explores Exploratory Data Analysis (EDA). Graphical methods are used to gain insights in EDA and these insights can be useful for forming tentative hypotheses when performing a root cause analysis (RCA). The topic of EDA is well addressed in the literature; however, empirical studies of the efficacy of EDA are lacking. We therefore aim to evaluate EDA by comparing one group of students identifying salient features in a table against a second group of students attempting to identify salient features in the same data presented in the form of a run chart, and then extracting relevant conclusions from such a comparison. Two groups of students were randomly selected to receive data; either in the form of a table or a run chart. They were then tasked with visually identifying any data points that stood out as interesting. The number of correctly identified values and the time to find the values were both evaluated by a two-sample t-test to determine if there was a statistically significant difference. The participants with a graph found the correct values that stood out in the data much quicker than those that used a table. Those using the data in the form of a table too much longer and failed to identify values that stood out. However, those with a graph also had far more false positives. Much has been written on the topic of EDA in the literature; however, an empirical evaluation of this common methodology is lacking. This paper confirms with empirical evidence the effectiveness of EDA.