Haneen Mohammed, Charlie Summers, Sughosh Kaushik, Eugene Wu
{"title":"SmokedDuck Demonstration: SQLStepper","authors":"Haneen Mohammed, Charlie Summers, Sughosh Kaushik, Eugene Wu","doi":"10.1145/3555041.3589731","DOIUrl":null,"url":null,"abstract":"Fine-grained lineage tracks the relationships between input and output of a query, and is particularly useful in analytical applications such as query debugging, view maintenance, query explanations, and data cleaning. Prior approaches rewrite SQL queries to also track lineage, but can slow query execution in analytical engines that are designed to process complex query patterns on large datasets. Moreover, they mainly capture lineage at the logical level. SmokedDuck extends DuckDB to support fast lineage capture and querying by tracking lineage at the instruction level by leveraging the duality between lineage and data movement. In this demonstration, we show how a user can leverage operator-level lineage to understand and debug a query execution through SQLStepper: an application built on top of SmokedDuck. Users upload data and execute queries using an in-browser command line, then explore query-level and operator-level lineage visually to track down bugs.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2023 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555041.3589731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fine-grained lineage tracks the relationships between input and output of a query, and is particularly useful in analytical applications such as query debugging, view maintenance, query explanations, and data cleaning. Prior approaches rewrite SQL queries to also track lineage, but can slow query execution in analytical engines that are designed to process complex query patterns on large datasets. Moreover, they mainly capture lineage at the logical level. SmokedDuck extends DuckDB to support fast lineage capture and querying by tracking lineage at the instruction level by leveraging the duality between lineage and data movement. In this demonstration, we show how a user can leverage operator-level lineage to understand and debug a query execution through SQLStepper: an application built on top of SmokedDuck. Users upload data and execute queries using an in-browser command line, then explore query-level and operator-level lineage visually to track down bugs.