Sketch+ for Visual and Correlation-Based Exploratory Data Analysis: A Case Study with COVID-19 Databases

M. Cazzolato, L. S. Rodrigues, M. X. Ribeiro, M. A. Gutierrez, C. Traina, A. J. Traina
{"title":"Sketch+ for Visual and Correlation-Based Exploratory Data Analysis: A Case Study with COVID-19 Databases","authors":"M. Cazzolato, L. S. Rodrigues, M. X. Ribeiro, M. A. Gutierrez, C. Traina, A. J. Traina","doi":"10.5753/jidm.2022.2484","DOIUrl":null,"url":null,"abstract":"The amount of data daily generated by different sources grows exponentially and brings new challenges to the information technology experts. The recorded data usually include heterogeneous attribute types, such as the traditional date, numerical, textual, and categorical information, as well as complex ones, such as images, videos, and multidimensional data. Simply posing similarity queries over such records can underestimate the semantics and potential usefulness of particular attributes. In this context, the Exploratory Data Analysis (EDA) technology is well-suited to understand data and perform knowledge extraction and visualization of existing patterns. In this paper, we propose Sketch+ , a technique and a corresponding supporting tool to compare electronic health records (provided by hospitals) by similarity, supporting correlation-based exploratory analysis over attributes of different types and allowing data preprocessing tasks for visualization and knowledge extraction. Sketch+ computes partial and overall data correlation considering distance spaces induced by the attributes. It employs both ANOVA and association rules with lift correlations to study relationships between variables, allowing extensive data analysis. Among the tools provided, a pixel-oriented one drives the analysts to observe visual correlations among dates, categorical and numerical attributes. As a running case study, we employed three open databases of COVID-19 cases, showing that specialists can benefit from the inference modules of Sketch+ to analyze electronic records. The study highlights how Sketch+ can be employed to spot strong correlations among tuples and attributes, with statistically significant results. The exploratory analysis has been shown to be an essential complement for similarity search tasks, identifying and evaluating patterns from heterogeneous attributes.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Data Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/jidm.2022.2484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The amount of data daily generated by different sources grows exponentially and brings new challenges to the information technology experts. The recorded data usually include heterogeneous attribute types, such as the traditional date, numerical, textual, and categorical information, as well as complex ones, such as images, videos, and multidimensional data. Simply posing similarity queries over such records can underestimate the semantics and potential usefulness of particular attributes. In this context, the Exploratory Data Analysis (EDA) technology is well-suited to understand data and perform knowledge extraction and visualization of existing patterns. In this paper, we propose Sketch+ , a technique and a corresponding supporting tool to compare electronic health records (provided by hospitals) by similarity, supporting correlation-based exploratory analysis over attributes of different types and allowing data preprocessing tasks for visualization and knowledge extraction. Sketch+ computes partial and overall data correlation considering distance spaces induced by the attributes. It employs both ANOVA and association rules with lift correlations to study relationships between variables, allowing extensive data analysis. Among the tools provided, a pixel-oriented one drives the analysts to observe visual correlations among dates, categorical and numerical attributes. As a running case study, we employed three open databases of COVID-19 cases, showing that specialists can benefit from the inference modules of Sketch+ to analyze electronic records. The study highlights how Sketch+ can be employed to spot strong correlations among tuples and attributes, with statistically significant results. The exploratory analysis has been shown to be an essential complement for similarity search tasks, identifying and evaluating patterns from heterogeneous attributes.
基于可视化和相关性的探索性数据分析Sketch+:以COVID-19数据库为例
每天由不同来源产生的数据量呈指数级增长,给信息技术专家带来了新的挑战。记录的数据通常包括异构属性类型,如传统的日期、数字、文本和分类信息,以及复杂的属性类型,如图像、视频和多维数据。简单地对这些记录进行相似性查询可能会低估特定属性的语义和潜在用途。在这种情况下,探索性数据分析(EDA)技术非常适合于理解数据并执行现有模式的知识提取和可视化。在本文中,我们提出了Sketch+,一种技术和相应的支持工具,通过相似性来比较电子健康记录(由医院提供),支持对不同类型的属性进行基于相关性的探索性分析,并允许数据预处理任务进行可视化和知识提取。Sketch+考虑属性引起的距离空间,计算部分和整体数据的相关性。它采用方差分析和关联规则与提升相关性来研究变量之间的关系,允许广泛的数据分析。在提供的工具中,一个面向像素的工具驱动分析人员观察日期、分类和数字属性之间的视觉相关性。作为运行案例研究,我们使用了三个开放的COVID-19病例数据库,表明专家可以受益于Sketch+的推理模块来分析电子记录。该研究强调了如何使用Sketch+来发现元组和属性之间的强相关性,并获得统计上显著的结果。探索性分析已被证明是相似性搜索任务的重要补充,可以从异构属性中识别和评估模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信