Dylan Wootton, Amy Rae Fox, Evan Peck, Arvind Satyanarayan
{"title":"绘制 EDA 图表:用混合方法表征计算笔记本中交互式可视化的使用情况","authors":"Dylan Wootton, Amy Rae Fox, Evan Peck, Arvind Satyanarayan","doi":"arxiv-2409.10450","DOIUrl":null,"url":null,"abstract":"Interactive visualizations are powerful tools for Exploratory Data Analysis\n(EDA), but how do they affect the observations analysts make about their data?\nWe conducted a qualitative experiment with 13 professional data scientists\nanalyzing two datasets with Jupyter notebooks, collecting a rich dataset of\ninteraction traces and think-aloud utterances. By qualitatively coding\nparticipant utterances, we introduce a formalism that describes EDA as a\nsequence of analysis states, where each state is comprised of either a\nrepresentation an analyst constructs (e.g., the output of a data frame, an\ninteractive visualization, etc.) or an observation the analyst makes (e.g.,\nabout missing data, the relationship between variables, etc.). By applying our\nformalism to our dataset, we identify that interactive visualizations, on\naverage, lead to earlier and more complex insights about relationships between\ndataset attributes compared to static visualizations. Moreover, by calculating\nmetrics such as revisit count and representational diversity, we uncover that\nsome representations serve more as \"planning aids\" during EDA rather than tools\nstrictly for hypothesis-answering. We show how these measures help identify\nother patterns of analysis behavior, such as the \"80-20 rule\", where a small\nsubset of representations drove the majority of observations. Based on these\nfindings, we offer design guidelines for interactive exploratory analysis\ntooling and reflect on future directions for studying the role that\nvisualizations play in EDA.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"210 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Charting EDA: Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism\",\"authors\":\"Dylan Wootton, Amy Rae Fox, Evan Peck, Arvind Satyanarayan\",\"doi\":\"arxiv-2409.10450\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interactive visualizations are powerful tools for Exploratory Data Analysis\\n(EDA), but how do they affect the observations analysts make about their data?\\nWe conducted a qualitative experiment with 13 professional data scientists\\nanalyzing two datasets with Jupyter notebooks, collecting a rich dataset of\\ninteraction traces and think-aloud utterances. By qualitatively coding\\nparticipant utterances, we introduce a formalism that describes EDA as a\\nsequence of analysis states, where each state is comprised of either a\\nrepresentation an analyst constructs (e.g., the output of a data frame, an\\ninteractive visualization, etc.) or an observation the analyst makes (e.g.,\\nabout missing data, the relationship between variables, etc.). By applying our\\nformalism to our dataset, we identify that interactive visualizations, on\\naverage, lead to earlier and more complex insights about relationships between\\ndataset attributes compared to static visualizations. Moreover, by calculating\\nmetrics such as revisit count and representational diversity, we uncover that\\nsome representations serve more as \\\"planning aids\\\" during EDA rather than tools\\nstrictly for hypothesis-answering. We show how these measures help identify\\nother patterns of analysis behavior, such as the \\\"80-20 rule\\\", where a small\\nsubset of representations drove the majority of observations. Based on these\\nfindings, we offer design guidelines for interactive exploratory analysis\\ntooling and reflect on future directions for studying the role that\\nvisualizations play in EDA.\",\"PeriodicalId\":501541,\"journal\":{\"name\":\"arXiv - CS - Human-Computer Interaction\",\"volume\":\"210 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Human-Computer Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10450\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Charting EDA: Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism
Interactive visualizations are powerful tools for Exploratory Data Analysis
(EDA), but how do they affect the observations analysts make about their data?
We conducted a qualitative experiment with 13 professional data scientists
analyzing two datasets with Jupyter notebooks, collecting a rich dataset of
interaction traces and think-aloud utterances. By qualitatively coding
participant utterances, we introduce a formalism that describes EDA as a
sequence of analysis states, where each state is comprised of either a
representation an analyst constructs (e.g., the output of a data frame, an
interactive visualization, etc.) or an observation the analyst makes (e.g.,
about missing data, the relationship between variables, etc.). By applying our
formalism to our dataset, we identify that interactive visualizations, on
average, lead to earlier and more complex insights about relationships between
dataset attributes compared to static visualizations. Moreover, by calculating
metrics such as revisit count and representational diversity, we uncover that
some representations serve more as "planning aids" during EDA rather than tools
strictly for hypothesis-answering. We show how these measures help identify
other patterns of analysis behavior, such as the "80-20 rule", where a small
subset of representations drove the majority of observations. Based on these
findings, we offer design guidelines for interactive exploratory analysis
tooling and reflect on future directions for studying the role that
visualizations play in EDA.