Proceedings of the 35th International Conference on Scientific and Statistical Database Management最新文献

筛选
英文 中文
A Computer Vision Approach for Detecting Discrepancies in Map Textual Labels 地图文本标签差异检测的计算机视觉方法
Abdulrahman Salama, Mahmoud Elkamhawy, Mohamed Ali, Ehab Al-Masri, Adel Sabour, Abdeltawab M. Hendawi, Ming Tan, Vashutosh Agrawal, Ravi Prakash
{"title":"A Computer Vision Approach for Detecting Discrepancies in Map Textual Labels","authors":"Abdulrahman Salama, Mahmoud Elkamhawy, Mohamed Ali, Ehab Al-Masri, Adel Sabour, Abdeltawab M. Hendawi, Ming Tan, Vashutosh Agrawal, Ravi Prakash","doi":"10.1145/3603719.3603722","DOIUrl":"https://doi.org/10.1145/3603719.3603722","url":null,"abstract":"Maps provide various sources of information. An important example of such information is textual labels such as cities, neighborhoods, and street names. Although we treat this information as facts, and despite the massive effort done by providers to continuously improve their accuracy, this data is far from perfect. Discrepancies in textual labels rendered on the map are one of the major sources of inconsistencies across map providers. These discrepancies can have significant impacts on the reliability of the derived information and decision-making processes. Thus, it is important to validate the accuracy and consistency in such data. Most providers treat this data as their propriety data and it is not available to the public, thus we cannot compare the data directly. To address these challenges, we introduce a novel computer vision-based approach for automatically extracting and classifying labels based on the visual characteristics of the label, which indicates its category based on the format convention used by the specific map provider. Based on the extracted data, we detect the degree of discrepancies across map providers. We consider three map providers: Bing Maps, Google Maps, and OpenStreetMaps. The neural network we develop classifies the text labels with an accuracy up to 93% in all providers. We leverage our system to analyze randomly selected regions in different markets. The studied markets are USA, Germany, France, and Brazil. Experimental results and statistical analysis reveal the amount of discrepancies across map providers per region. We calculate the Jaccard distance between the extracted text sets for each pair of map providers, which represents the discrepancy percentage. Discrepancies percentages as high as 90% were found in some markets.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121443821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESM2-Tree: An maintenance efficient authentication data structure in blockchain ESM2-Tree:区块链中一种维护效率高的认证数据结构
Yuzhou Fang, Liang Cai, Weiwei Qiu, Fanglei Huang, Huaihai Hui
{"title":"ESM2-Tree: An maintenance efficient authentication data structure in blockchain","authors":"Yuzhou Fang, Liang Cai, Weiwei Qiu, Fanglei Huang, Huaihai Hui","doi":"10.1145/3603719.3603721","DOIUrl":"https://doi.org/10.1145/3603719.3603721","url":null,"abstract":"Blockchain technology is gaining broader attention. Owing to its immutability property and byzantine fault-tolerance consensus protocol, blockchain offers a brand new trusted data-sharing solution. Some researchers use blockchain to drive autonomous collaboration among smart devices, which face massive spatial data updates and usage. The key challenge lies in designing an authenticated data structure (ADS) that can efficiently process spatial data and queries. However, the previous schemes could not handle spatial data efficiently or did not consider the efficiency of frequent data updates. In this paper, we take a step toward implementing a maintenance-efficient ADS on the blockchain, called ESM2-Tree, which is not only good at processing spatial data but also effective in supporting authenticated spatial queries by partitioning and merging data at different granularities. Theoretical analysis and empirical evaluation validate the performance of our ADS, which reduces the overall data structure maintenance overhead by about 50% in a uniform data distribution scenario.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126622780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Machine Learning Queries with Linear Algebra Query Processing 用线性代数查询处理加速机器学习查询
Wenbo Sun, Asterios Katsifodimos, Rihan Hai
{"title":"Accelerating Machine Learning Queries with Linear Algebra Query Processing","authors":"Wenbo Sun, Asterios Katsifodimos, Rihan Hai","doi":"10.1145/3603719.3603726","DOIUrl":"https://doi.org/10.1145/3603719.3603726","url":null,"abstract":"The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additionally, the diverging mathematical foundations of data processing and machine learning hinder cross-optimizations by combining these two components, thereby overlooking potential opportunities to expedite predictive pipelines. In this paper, we propose an operator fusing method based on GPU-accelerated linear algebraic evaluation of relational queries. Our method leverages linear algebra computation properties to merge operators in machine learning predictions and data processing, significantly accelerating predictive pipelines by up to 317x. We perform a complexity analysis to deliver quantitative insights into the advantages of operator fusion, considering various data and model dimensions. Furthermore, we extensively evaluate matrix multiplication query processing utilizing the widely-used Star Schema Benchmark. Through comprehensive evaluations, we demonstrate the effectiveness and potential of our approach in improving the efficiency of data processing and machine learning workloads on modern hardware.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134464156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing? 为数据分析选择有效的集群资源:何时以及如何分配内存中的处理?
Jonathan Will, L. Thamsen, Dominik Scheinert, O. Kao
{"title":"Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?","authors":"Jonathan Will, L. Thamsen, Dominik Scheinert, O. Kao","doi":"10.1145/3603719.3603733","DOIUrl":"https://doi.org/10.1145/3603719.3603733","url":null,"abstract":"Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration. In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in-memory processing with in-memory data processing frameworks can undermine resource efficiency. Based on the findings of our trace data analysis, we compile requirements towards an automated solution for efficient cluster resource allocation.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117190771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Indexing Temporal Relations for Range-Duration Queries 为范围-持续时间查询索引时间关系
Matteo Ceccarello, Anton Dignös, J. Gamper, Christina Khnaisser
{"title":"Indexing Temporal Relations for Range-Duration Queries","authors":"Matteo Ceccarello, Anton Dignös, J. Gamper, Christina Khnaisser","doi":"10.1145/3603719.3603732","DOIUrl":"https://doi.org/10.1145/3603719.3603732","url":null,"abstract":"Temporal information plays a crucial role in many database applications, however support for queries on such data is limited. We present an index structure, termed RD-index, to support range-duration queries over interval timestamped relations, which constrain both the range of the tuples’ positions on the timeline and their duration. RD-index is a grid structure in the two-dimensional space, representing the position on the timeline and the duration of timestamps, respectively. Instead of using a regular grid, we consider the data distribution for the construction of the grid in order to ensure that each grid cell contains approximately the same number of intervals. RD-index features provable bounds on the running time of all the operations, allow for a simple implementation, and supports very predictable query performance. We benchmark our solution on a variety of datasets and query workloads, investigating both the query rate and the behavior of the individual queries. The results show that RD-index performs better than the baselines on range-duration queries, for which it is explicitly designed. Furthermore, it outperforms state of the art indexes also on mixed workloads containing queries that constrain either only the duration or the range along with range-duration queries. Finally, the size of the RD-index is in all settings smaller than the competitors.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115646788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Proceedings of the 35th International Conference on Scientific and Statistical Database Management 第35届科学与统计数据库管理国际会议论文集
{"title":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","authors":"","doi":"10.1145/3603719","DOIUrl":"https://doi.org/10.1145/3603719","url":null,"abstract":"","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125538792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信