实时流应用的交互式数据清理

Timo Räth, Ngozichukwuka Onah, K. Sattler
{"title":"实时流应用的交互式数据清理","authors":"Timo Räth, Ngozichukwuka Onah, K. Sattler","doi":"10.1145/3597465.3605229","DOIUrl":null,"url":null,"abstract":"The importance of data cleaning systems has continuously grown in recent years. Especially for real-time streaming applications, it is crucial, to identify and possibly remove anomalies in the data on the fly before further processing. The main challenge however lies in the construction of an appropriate data cleaning pipeline, which is complicated by the dynamic nature of streaming applications. To simplify this process and help data scientists to explore and understand the incoming data, we propose an interactive data cleaning system for streaming applications. In this paper, we list requirements for such a system and present our implementation to overcome the stated issues. Our demonstration shows, how a data cleaning pipeline can be interactively created, executed, and monitored at runtime. We also present several different tools, such as the automated advisor and the adaptive visualizer, that engage the user in the data cleaning process and help them understand the behavior of the pipeline.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"51 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interactive Data Cleaning for Real-Time Streaming Applications\",\"authors\":\"Timo Räth, Ngozichukwuka Onah, K. Sattler\",\"doi\":\"10.1145/3597465.3605229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The importance of data cleaning systems has continuously grown in recent years. Especially for real-time streaming applications, it is crucial, to identify and possibly remove anomalies in the data on the fly before further processing. The main challenge however lies in the construction of an appropriate data cleaning pipeline, which is complicated by the dynamic nature of streaming applications. To simplify this process and help data scientists to explore and understand the incoming data, we propose an interactive data cleaning system for streaming applications. In this paper, we list requirements for such a system and present our implementation to overcome the stated issues. Our demonstration shows, how a data cleaning pipeline can be interactively created, executed, and monitored at runtime. We also present several different tools, such as the automated advisor and the adaptive visualizer, that engage the user in the data cleaning process and help them understand the behavior of the pipeline.\",\"PeriodicalId\":92279,\"journal\":{\"name\":\"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)\",\"volume\":\"51 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3597465.3605229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3597465.3605229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,数据清洗系统的重要性不断提高。特别是对于实时流应用程序,在进一步处理之前,识别并可能消除数据中的异常是至关重要的。然而,主要的挑战在于构建适当的数据清理管道,这由于流应用程序的动态特性而变得复杂。为了简化这一过程并帮助数据科学家探索和理解传入的数据,我们提出了一种用于流应用程序的交互式数据清洗系统。在本文中,我们列出了这样一个系统的需求,并提出了我们的实现来克服这些问题。我们的演示展示了如何在运行时交互式地创建、执行和监视数据清理管道。我们还介绍了几种不同的工具,如自动顾问和自适应可视化工具,它们使用户参与数据清理过程,并帮助他们理解管道的行为。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interactive Data Cleaning for Real-Time Streaming Applications
The importance of data cleaning systems has continuously grown in recent years. Especially for real-time streaming applications, it is crucial, to identify and possibly remove anomalies in the data on the fly before further processing. The main challenge however lies in the construction of an appropriate data cleaning pipeline, which is complicated by the dynamic nature of streaming applications. To simplify this process and help data scientists to explore and understand the incoming data, we propose an interactive data cleaning system for streaming applications. In this paper, we list requirements for such a system and present our implementation to overcome the stated issues. Our demonstration shows, how a data cleaning pipeline can be interactively created, executed, and monitored at runtime. We also present several different tools, such as the automated advisor and the adaptive visualizer, that engage the user in the data cleaning process and help them understand the behavior of the pipeline.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信