Companion of the 2023 International Conference on Management of Data最新文献

筛选
英文 中文
PyNKDV: An Efficient Network Kernel Density Visualization Library for Geospatial Analytic Systems 地理空间分析系统的高效网络核密度可视化库
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589711
Tsz Nam Chan, Rui Zang, Pak Lon Ip, Leong Hou U, Jianliang Xu
{"title":"PyNKDV: An Efficient Network Kernel Density Visualization Library for Geospatial Analytic Systems","authors":"Tsz Nam Chan, Rui Zang, Pak Lon Ip, Leong Hou U, Jianliang Xu","doi":"10.1145/3555041.3589711","DOIUrl":"https://doi.org/10.1145/3555041.3589711","url":null,"abstract":"Network kernel density visualization (NKDV) is an important tool for many application domains, including criminology and transportation science. However, all existing software tools, e.g., SANET (a plug-in for QGIS and ArcGIS) and spNetwork (an R package), adopt the naïve implementation of NKDV, which does not scale to large-scale location datasets and high-resolution sizes. To overcome this issue, we develop the first python library, called PyNKDV, which adopts our complexity-reduced solution and its parallel implementation to significantly improve the efficiency for generating NKDV. Moreover, PyNKDV is also user friendly (with four lines of python code) and can support commonly used geospatial analytic systems (e.g., QGIS and ArcGIS). In this demonstration, we will use three large-scale location datasets (up to 7.71 million data points), provide different python scripts (in the Jupyter Notebook), and install existing software tools (i.e., SANET and spNetwork) for participants to (1) explore different functionalities of our PyNKDV library and (2) compare its practical efficiency with existing software tools.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123548524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Large-scale Geospatial Analytics: Problems, Challenges, and Opportunities 大规模地理空间分析:问题、挑战和机遇
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589401
Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu, R. Cheng
{"title":"Large-scale Geospatial Analytics: Problems, Challenges, and Opportunities","authors":"Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu, R. Cheng","doi":"10.1145/3555041.3589401","DOIUrl":"https://doi.org/10.1145/3555041.3589401","url":null,"abstract":"Geospatial analytics is an important field in many communities, including crime science, transportation science, epidemiology, ecology, and urban planning. However, with the rapid growth of big geospatial data, most of the commonly used geospatial analytic tools are not efficient (or even feasible) to support large-scale datasets. As such, domain experts have raised the concerns about the inefficiency issues for using these tools. In this tutorial, we aim to arouse the attention of database researchers for this important, emerging, database-related, and interdisciplinary topic, which consists of four parts. In the first part, we will discuss different problems and highlight the challenges for two types of geospatial analytic tools, which are (1) hotspot detection and (2) correlation analysis. In the second and third parts, we will specifically discuss two geospatial analytic tools, namely kernel density visualization (the representative hotspot detection method) and K-function (the representative correlation analysis method), respectively, and their variants. In the fourth part, we will highlight the future opportunities for this topic.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125130712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SmokedDuck Demonstration: SQLStepper 熏鸭示范:sqlstep
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589731
Haneen Mohammed, Charlie Summers, Sughosh Kaushik, Eugene Wu
{"title":"SmokedDuck Demonstration: SQLStepper","authors":"Haneen Mohammed, Charlie Summers, Sughosh Kaushik, Eugene Wu","doi":"10.1145/3555041.3589731","DOIUrl":"https://doi.org/10.1145/3555041.3589731","url":null,"abstract":"Fine-grained lineage tracks the relationships between input and output of a query, and is particularly useful in analytical applications such as query debugging, view maintenance, query explanations, and data cleaning. Prior approaches rewrite SQL queries to also track lineage, but can slow query execution in analytical engines that are designed to process complex query patterns on large datasets. Moreover, they mainly capture lineage at the logical level. SmokedDuck extends DuckDB to support fast lineage capture and querying by tracking lineage at the instruction level by leveraging the duality between lineage and data movement. In this demonstration, we show how a user can leverage operator-level lineage to understand and debug a query execution through SQLStepper: an application built on top of SmokedDuck. Users upload data and execute queries using an in-browser command line, then explore query-level and operator-level lineage visually to track down bugs.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125171125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Future of Database System Architectures 数据库系统架构的未来
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589360
G. Alonso, N. Ailamaki, S. Krishnamurthy, S. Madden, S. Sivasubramanian, R. Ramakrishnan
{"title":"Future of Database System Architectures","authors":"G. Alonso, N. Ailamaki, S. Krishnamurthy, S. Madden, S. Sivasubramanian, R. Ramakrishnan","doi":"10.1145/3555041.3589360","DOIUrl":"https://doi.org/10.1145/3555041.3589360","url":null,"abstract":"Over the past two decades, we have experienced major technology disruptions on multiple fronts, none bigger than the emergence of cloud computing, which has led to fundamental changes in how database software is architected. We are seeing several new trends that are similarly shaping the future of data management. With the demise of Moore's Law, we are now seeing a lot of interest (and start-ups with significant investments) in hardware database accelerators, exploring FPGAs, GPUs, and more. Economies of scale in the cloud make it possible to move to hardware many things that were done in software, the trend will continue and increase. Modern data estates are spread across data located on premises, on the edge and in one or more public clouds, spread across various sources like multiple relational databases, file and storage systems, and no-SQL systems, both operational and analytic. This phenomenon is referred to as data sprawl. We are also seeing the emergence of many novel data workloads. For example, rich data pipelines are an increasingly common workload. And finally, Machine Learning is having a rapidly increasing role in every aspect of the database software lifecycle. This SIGMOD panel will discuss the impact of the above changes and trends on database hardware and software architectures. How will these changes impact DB system design, how will DB systems look like in the near future? Where are the hardest research challenges? What learnings from the past will guide us through these disruptions?","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115437289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Demonstration of KAMEL: A Scalable BERT-based System for Trajectory Imputation 基于bert的可扩展轨迹输入系统KAMEL的演示
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589733
Mashaal Musleh, M. Mokbel
{"title":"A Demonstration of KAMEL: A Scalable BERT-based System for Trajectory Imputation","authors":"Mashaal Musleh, M. Mokbel","doi":"10.1145/3555041.3589733","DOIUrl":"https://doi.org/10.1145/3555041.3589733","url":null,"abstract":"This demo presents KAMEL; a novel trajectory imputation framework that aims to impute sparse trajectories as a means of increasing their accuracy, and hence the accuracy of their applications. Unlike the large majority of current trajectory imputation techniques, KAMEL does not require the knowledge or the availability of the underlying road network, which makes it applicable to important applications like map inference that need to infer the road network itself. Audience will experience KAMEL through various scenarios that show the imputation accuracy as well as KAMEL internals.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129094308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural Language 演示NaturalMiner:搜索用自然语言描述的抽象模式的大数据集
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589694
Immanuel Trummer
{"title":"Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural Language","authors":"Immanuel Trummer","doi":"10.1145/3555041.3589694","DOIUrl":"https://doi.org/10.1145/3555041.3589694","url":null,"abstract":"The NaturalMiner system seeks to extract facts from large relational data sets that match abstract patterns defined in natural language. For instance, this enables users to search, with regards to a specific airline, for evidence that \"the airline underperforms\" or \"the airline outperforms'' within a data set containing flight statistics, hinting at areas for improvements or strengths to advertise. Internally, NaturalMiner iteratively generates statistical facts from data by processing SQL queries, selecting facts to generate by a reinforcement learning approach. It uses pre-trained language models to score candidate facts with regards to user-specified search patterns, returning the fact combination with maximal score after a user-specified time budget. To deal with large data sets, NaturalMiner features customized caching and sampling strategies. The proposed demonstration will showcase search for different patterns described in natural language, covering different data sets and scenarios.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125941364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques 优化张量计算:从应用程序到编译和运行时技术
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589407
Matthias Boehm, Matteo Interlandi, Christopher M. Jermaine
{"title":"Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques","authors":"Matthias Boehm, Matteo Interlandi, Christopher M. Jermaine","doi":"10.1145/3555041.3589407","DOIUrl":"https://doi.org/10.1145/3555041.3589407","url":null,"abstract":"Machine learning (ML) training and scoring fundamentally relies on linear algebra programs and more general tensor computations. Most ML systems utilize distributed parameter servers and similar distribution strategies for mini-batch stochastic gradient descent training. However, many more tasks in the data science and engineering lifecycle can benefit from efficient tensor computations. Examples include primitives for data cleaning, data and model debugging, data augmentation, query processing, numerical simulations, as well as a wide variety of training and scoring algorithms. In this survey tutorial, we first make a case for the importance of optimizing more general tensor computations, and then provide an in-depth survey of existing applications, optimizing compilation techniques, and underlying runtime strategies. Interestingly, there are close connections to data-intensive applications, query rewriting and optimization, as well as query processing and physical design. Our goal for the tutorial is to structure existing work, create common terminology, and identify open research challenges.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130860110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personal Data for Personal Use: Vision or Reality? 个人使用的个人数据:愿景还是现实?
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589378
X. Dong, Bo Li, Julia Stoyanovich, A. Tung, G. Weikum, A. Halevy, Wang-Chiew Tan
{"title":"Personal Data for Personal Use: Vision or Reality?","authors":"X. Dong, Bo Li, Julia Stoyanovich, A. Tung, G. Weikum, A. Halevy, Wang-Chiew Tan","doi":"10.1145/3555041.3589378","DOIUrl":"https://doi.org/10.1145/3555041.3589378","url":null,"abstract":"The vision of collecting all of one's personal information into one searchable database has been around at least since Vannevar Bush's 1945 paper on the Memex System [2]. In the late 1990's, Gordon Bell and his colleagues at Microsoft Research built MyLifeBits [1, 6], which was the first serious attempt to build such a database. Since then, there has been continued interest in our community to build personal information management systems [3-5, 7, 8, 10]. Recently, the Solid Project proposes a more radical approach to personal information, arguing that all of one's data should reside in their own data pod, and applications should be redesigned to fetch data from the pod [9].","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132102509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fairness in Ranking: From Values to Technical Choices and Back 排名的公平性:从价值观到技术选择再回来
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589405
Julia Stoyanovich, Meike Zehlike, Ke Yang
{"title":"Fairness in Ranking: From Values to Technical Choices and Back","authors":"Julia Stoyanovich, Meike Zehlike, Ke Yang","doi":"10.1145/3555041.3589405","DOIUrl":"https://doi.org/10.1145/3555041.3589405","url":null,"abstract":"In the past few years, there has been much work on incorporating fairness requirements into the design of algorithmic rankers, with contributions from the data management, algorithms, information retrieval, and recommender systems communities. In this tutorial, we give a systematic overview of this work, offering a broad perspective that connects formalizations and algorithmic approaches across subfields. During the first part of the tutorial, we present a classification framework for fairness-enhancing interventions, along which we will then relate the technical methods. This framework allows us to unify the presentation of mitigation objectives and of algorithmic techniques to help meet those objectives or identify trade-offs. Next, we discuss fairness in score-based ranking and in supervised learning-to-rank. We conclude with recommendations for practitioners, to help them select a fair ranking method based on the requirements of their specific application domain.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"33 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132899002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SparkSQL+: Next-generation Query Planning over Spark SparkSQL+:基于Spark的下一代查询规划
Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589715
Binyang Dai, Qichen Wang, K. Yi
{"title":"SparkSQL+: Next-generation Query Planning over Spark","authors":"Binyang Dai, Qichen Wang, K. Yi","doi":"10.1145/3555041.3589715","DOIUrl":"https://doi.org/10.1145/3555041.3589715","url":null,"abstract":"We will demonstrate SparkSQL+, a SQL processing engine built on top of Spark. Unlike the vanilla SparkSQL that uses classical query plans, SparkSQL+ adopts some of the recently developed new query plans, including generalized hypertree decompositions(GHD), worst-case optimal join (WCOJ) algorithms, and conjunctive queries with comparisons (CQC). SparkSQL+ also provides a platform for users to explore different query plans for a given query through a web-based interface, and compare their performance with classical query plans on the same Spark core.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129889489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信